Re: RAID5 doesn't mount on boot, but you can afterwards?

2015-09-30 Thread Sjoerd
hi my fstab looks as follows (nb: i added the recovery option to see if 
that would help, which didn't) the bootdisk (and @home)is a ssd and the 
label STORAGE represents the RAID5 array:


# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
#
# / was on /dev/sdc1 during installation
UUID=0ea60d4d-3f34-4451-8272-442fcccb7f2e /   btrfs   
recovery,noatime,nodiratime,subvol=@ 0   1

# /home was on /dev/sdc1 during installation
UUID=0ea60d4d-3f34-4451-8272-442fcccb7f2e /home   btrfs   
recovery,noatime,nodiratime,subvol=@home 0   2


# STORAGE
LABEL=STORAGE   /data/HOME  btrfs   
recovery,noatime,nodiratime,compress=zlib,subvol=@home_dir  0   2
LABEL=STORAGE   /data/Pictures  btrfs   
recovery,noatime,nodiratime,compress=zlib,subvol=@pictures  0   2
LABEL=STORAGE   /data/Multimediabtrfs   
recovery,noatime,nodiratime,compress=zlib,subvol=@multimedia0   2
LABEL=STORAGE   /data/dockerbtrfs   
recovery,noatime,nodiratime,compress=zlib,subvol=@docker0   
  2LABEL=STORAGE   /data/vms   btrfs   
recovery,noatime,nodiratime,compress=zlib,subvol=@vms   0   2
LABEL=STORAGE   /data/Downloads btrfs   
recovery,noatime,nodiratime,compress=zlib,subvol=@downloads 0   2
LABEL=STORAGE   /data/Backups   btrfs   
recovery,noatime,nodiratime,compress=zlib,subvol=@backups   0   2
LABEL=STORAGE   /data/Software  btrfs   
recovery,noatime,nodiratime,compress=zlib,subvol=@software  0   2




On September 30, 2015 9:04:39 PM Leonidas Spyropoulos  
wrote:



Hello,

On 30/09/15, Sjoerd wrote:

Hi All,

A RAID5 setup on raw devices doesn't want to automount on boot.
[..]


Post your /etc/fstab file please.

Thanks
--
Sent using mutt
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 6/9] vfs: Copy should use file_out rather than file_in

2015-09-30 Thread Anna Schumaker
The way to think about this is that the destination filesystem reads the
data from the source file and processes it accordingly.  This is
especially important to avoid an infinate loop when doing a "server to
server" copy on NFS.

Signed-off-by: Anna Schumaker 
---
 fs/read_write.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 8e7cb33..6f74f1f 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1355,7 +1355,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t 
pos_in,
if (!(file_in->f_mode & FMODE_READ) ||
!(file_out->f_mode & FMODE_WRITE) ||
(file_out->f_flags & O_APPEND) ||
-   !file_in->f_op || !file_in->f_op->copy_file_range)
+   !file_out->f_op || !file_out->f_op->copy_file_range)
return -EBADF;
 
inode_in = file_inode(file_in);
@@ -1378,8 +1378,8 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t 
pos_in,
if (ret)
return ret;
 
-   ret = file_in->f_op->copy_file_range(file_in, pos_in, file_out, pos_out,
-len, flags);
+   ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out, 
pos_out,
+ len, flags);
if (ret > 0) {
fsnotify_access(file_in);
add_rchar(current, ret);
-- 
2.6.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 3/9] btrfs: add .copy_file_range file operation

2015-09-30 Thread Anna Schumaker
From: Zach Brown 

This rearranges the existing COPY_RANGE ioctl implementation so that the
.copy_file_range file operation can call the core loop that copies file
data extent items.

The extent copying loop is lifted up into its own function.  It retains
the core btrfs error checks that should be shared.

Signed-off-by: Zach Brown 
[Anna Schumaker: Make flags an unsigned int]
Signed-off-by: Anna Schumaker 
Reviewed-by: Josef Bacik 
Reviewed-by: David Sterba 
---
v5:
- Make flags variable an unsigned int
---
 fs/btrfs/ctree.h |  3 ++
 fs/btrfs/file.c  |  1 +
 fs/btrfs/ioctl.c | 91 
 3 files changed, 56 insertions(+), 39 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 938efe3..0046567 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3996,6 +3996,9 @@ int btrfs_dirty_pages(struct btrfs_root *root, struct 
inode *inode,
  loff_t pos, size_t write_bytes,
  struct extent_state **cached);
 int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
+ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
+ struct file *file_out, loff_t pos_out,
+ size_t len, unsigned int flags);
 
 /* tree-defrag.c */
 int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index b823fac..b05449c 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2816,6 +2816,7 @@ const struct file_operations btrfs_file_operations = {
 #ifdef CONFIG_COMPAT
.compat_ioctl   = btrfs_ioctl,
 #endif
+   .copy_file_range = btrfs_copy_file_range,
 };
 
 void btrfs_auto_defrag_exit(void)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 0adf542..d3697e8 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3727,17 +3727,16 @@ out:
return ret;
 }
 
-static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
-  u64 off, u64 olen, u64 destoff)
+static noinline int btrfs_clone_files(struct file *file, struct file *file_src,
+   u64 off, u64 olen, u64 destoff)
 {
struct inode *inode = file_inode(file);
+   struct inode *src = file_inode(file_src);
struct btrfs_root *root = BTRFS_I(inode)->root;
-   struct fd src_file;
-   struct inode *src;
int ret;
u64 len = olen;
u64 bs = root->fs_info->sb->s_blocksize;
-   int same_inode = 0;
+   int same_inode = src == inode;
 
/*
 * TODO:
@@ -3750,49 +3749,20 @@ static noinline long btrfs_ioctl_clone(struct file 
*file, unsigned long srcfd,
 *   be either compressed or non-compressed.
 */
 
-   /* the destination must be opened for writing */
-   if (!(file->f_mode & FMODE_WRITE) || (file->f_flags & O_APPEND))
-   return -EINVAL;
-
if (btrfs_root_readonly(root))
return -EROFS;
 
-   ret = mnt_want_write_file(file);
-   if (ret)
-   return ret;
-
-   src_file = fdget(srcfd);
-   if (!src_file.file) {
-   ret = -EBADF;
-   goto out_drop_write;
-   }
-
-   ret = -EXDEV;
-   if (src_file.file->f_path.mnt != file->f_path.mnt)
-   goto out_fput;
-
-   src = file_inode(src_file.file);
-
-   ret = -EINVAL;
-   if (src == inode)
-   same_inode = 1;
-
-   /* the src must be open for reading */
-   if (!(src_file.file->f_mode & FMODE_READ))
-   goto out_fput;
+   if (file_src->f_path.mnt != file->f_path.mnt ||
+   src->i_sb != inode->i_sb)
+   return -EXDEV;
 
/* don't make the dst file partly checksummed */
if ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) !=
(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM))
-   goto out_fput;
+   return -EINVAL;
 
-   ret = -EISDIR;
if (S_ISDIR(src->i_mode) || S_ISDIR(inode->i_mode))
-   goto out_fput;
-
-   ret = -EXDEV;
-   if (src->i_sb != inode->i_sb)
-   goto out_fput;
+   return -EISDIR;
 
if (!same_inode) {
btrfs_double_inode_lock(src, inode);
@@ -3869,6 +3839,49 @@ out_unlock:
btrfs_double_inode_unlock(src, inode);
else
mutex_unlock(>i_mutex);
+   return ret;
+}
+
+ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
+ struct file *file_out, loff_t pos_out,
+ size_t len, unsigned int flags)
+{
+   ssize_t ret;
+
+   ret = btrfs_clone_files(file_out, file_in, pos_in, len, pos_out);
+   if (ret == 0)
+   ret = len;
+   return ret;
+}
+
+static noinline long btrfs_ioctl_clone(struct 

[PATCH v5 0/9] VFS: In-kernel copy system call

2015-09-30 Thread Anna Schumaker
Copy system calls came up during Plumbers a while ago, mostly because several
filesystems (including NFS and XFS) are currently working on copy acceleration
implementations.  We haven't heard from Zach Brown in a while, so I volunteered
to push his patches upstream so individual filesystems don't need to keep
writing their own ioctls.

This posting fixes a few issues that popped up after I submitted v4 yesterday.

Changes in v5:
- Bump syscall number (again)
- Add sys_copy_file_range() to include/linux/syscalls.h
- Change flags parameter on btrfs to an unsigned int


Anna Schumaker (6):
  vfs: Copy should check len after file open mode
  vfs: Copy shouldn't forbid ranges inside the same file
  vfs: Copy should use file_out rather than file_in
  vfs: Remove copy_file_range mountpoint checks
  vfs: Add vfs_copy_file_range() support for pagecache copies
  btrfs: btrfs_copy_file_range() only supports reflinks

Zach Brown (3):
  vfs: add copy_file_range syscall and vfs helper
  x86: add sys_copy_file_range to syscall tables
  btrfs: add .copy_file_range file operation

 arch/x86/entry/syscalls/syscall_32.tbl |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 fs/btrfs/ctree.h   |   3 +
 fs/btrfs/file.c|   1 +
 fs/btrfs/ioctl.c   |  95 +-
 fs/read_write.c| 141 +
 include/linux/copy.h   |   6 ++
 include/linux/fs.h |   3 +
 include/linux/syscalls.h   |   3 +
 include/uapi/asm-generic/unistd.h  |   4 +-
 include/uapi/linux/Kbuild  |   1 +
 include/uapi/linux/copy.h  |   8 ++
 kernel/sys_ni.c|   1 +
 13 files changed, 228 insertions(+), 40 deletions(-)
 create mode 100644 include/linux/copy.h
 create mode 100644 include/uapi/linux/copy.h

-- 
2.6.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 2/9] x86: add sys_copy_file_range to syscall tables

2015-09-30 Thread Anna Schumaker
From: Zach Brown 

Add sys_copy_file_range to the x86 syscall tables.

Signed-off-by: Zach Brown 
[Anna Schumaker: Update syscall number in syscall_32.tbl]
Signed-off-by: Anna Schumaker 
---
 arch/x86/entry/syscalls/syscall_32.tbl | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl 
b/arch/x86/entry/syscalls/syscall_32.tbl
index 7663c45..0531270 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -382,3 +382,4 @@
 373i386shutdownsys_shutdown
 374i386userfaultfd sys_userfaultfd
 375i386membarrier  sys_membarrier
+376i386copy_file_range sys_copy_file_range
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl 
b/arch/x86/entry/syscalls/syscall_64.tbl
index 278842f..03a9396 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -331,6 +331,7 @@
 32264  execveatstub_execveat
 323common  userfaultfd sys_userfaultfd
 324common  membarrier  sys_membarrier
+325common  copy_file_range sys_copy_file_range
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
-- 
2.6.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 doesn't mount on boot, but you can afterwards?

2015-09-30 Thread Leonidas Spyropoulos
Hello,

On 30/09/15, Sjoerd wrote:
> Hi All,
> 
> A RAID5 setup on raw devices doesn't want to automount on boot.  
> [..]

Post your /etc/fstab file please.

Thanks
-- 
Sent using mutt
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: fix resending received snapshot with parent

2015-09-30 Thread Robin Ruede
This fixes a regression introduced by 37b8d27d between v4.1 and v4.2.

When a snapshot is received, its received_uuid is set to the original
uuid of the subvolume. When that snapshot is then resent to a third
filesystem, it's received_uuid is set to the second uuid
instead of the original one. The same was true for the parent_uuid.
This behaviour was partially changed in 37b8d27d, but in that patch
only the parent_uuid was taken from the real original,
not the uuid itself, causing the search for the parent to fail in
the case below.

This happens for example when trying to send a series of linked
snapshots (e.g. created by snapper) from the backup file system back to the 
original one.

The following commands reproduce the issue in v4.2.1
(no error in 4.1.6)

# setup three test file systems
for i in 1 2 3; do
truncate -s 50M fs$i
mkfs.btrfs fs$i
mkdir $i
mount fs$i $i
done
echo "content" > 1/testfile
btrfs su snapshot -r 1/ 1/snap1
echo "changed content" > 1/testfile
btrfs su snapshot -r 1/ 1/snap2

# works fine:
btrfs send 1/snap1 | btrfs receive 2/
btrfs send -p 1/snap1 1/snap2 | btrfs receive 2/

# ERROR: could not find parent subvolume
btrfs send 2/snap1 | btrfs receive 3/
btrfs send -p 2/snap1 2/snap2 | btrfs receive 3/

Signed-off-by: Robin Ruede 
---
 fs/btrfs/send.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index aa72bfd..890933b 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -2351,8 +2351,14 @@ static int send_subvol_begin(struct send_ctx *sctx)
}
 
TLV_PUT_STRING(sctx, BTRFS_SEND_A_PATH, name, namelen);
-   TLV_PUT_UUID(sctx, BTRFS_SEND_A_UUID,
-   sctx->send_root->root_item.uuid);
+
+   if (!btrfs_is_empty_uuid(sctx->send_root->root_item.received_uuid))
+   TLV_PUT_UUID(sctx, BTRFS_SEND_A_UUID,
+   sctx->send_root->root_item.received_uuid);
+   else
+   TLV_PUT_UUID(sctx, BTRFS_SEND_A_UUID,
+   sctx->send_root->root_item.uuid);
+
TLV_PUT_U64(sctx, BTRFS_SEND_A_CTRANSID,
le64_to_cpu(sctx->send_root->root_item.ctransid));
if (parent_root) {
-- 
2.6.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] fstests: generic: Check if a bull fallocate will change extent number

2015-09-30 Thread Duncan
Qu Wenruo posted on Tue, 29 Sep 2015 18:48:37 +0800 as excerpted:

> Both gives quite good expression, I'll pick one of them.

... And for the one-line title, /bull/bad/ should do it. =:^)

People wanting details about bad /how/ can look at the fuller description 
or source.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs: Fix lost-data-profile caused by auto removing bg

2015-09-30 Thread Filipe Manana
On Tue, Sep 29, 2015 at 2:51 PM, Zhao Lei  wrote:
> Reproduce:
>  (In integration-4.3 branch)
>
>  TEST_DEV=(/dev/vdg /dev/vdh)
>  TEST_DIR=/mnt/tmp
>
>  umount "$TEST_DEV" >/dev/null
>  mkfs.btrfs -f -d raid1 "${TEST_DEV[@]}"
>
>  mount -o nospace_cache "$TEST_DEV" "$TEST_DIR"
>  umount "$TEST_DEV"
>
>  mount -o nospace_cache "$TEST_DEV" "$TEST_DIR"
>  btrfs filesystem usage $TEST_DIR
>
> We can see the data chunk changed from raid1 to single:
>  # btrfs filesystem usage $TEST_DIR
>  Data,single: Size:8.00MiB, Used:0.00B
> /dev/vdg8.00MiB
>  #
>
> Reason:
>  When a empty filesystem mount with -o nospace_cache, the last
>  data blockgroup will be auto-removed in umount.
>
>  Then if we mount it again, there is no data chunk in the
>  filesystem, so the only available data profile is 0x0, result
>  is all new chunks are created as single type.
>
> Fix:
>  Don't auto-delete last blockgroup for a raid type.
>
> Test:
>  Test by above script, and confirmed the logic by debug output.
>
> Signed-off-by: Zhao Lei 
> ---
>  fs/btrfs/extent-tree.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 79a5bd9..3505649 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -10012,7 +10012,8 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info 
> *fs_info)
>bg_list);
> space_info = block_group->space_info;
> list_del_init(_group->bg_list);
> -   if (ret || btrfs_mixed_space_info(space_info)) {
> +   if (ret || btrfs_mixed_space_info(space_info) ||
> +   block_group->list.next == block_group->list.prev) {

This isn't race free. The list block_group->list is protected by the
groups_sem semaphore. Need to take before doing this check.
You can do that in the "if" statement below this one, where we're
holding _info->groups_sem [1]

thanks

[1] 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/extent-tree.c?id=refs/tags/v4.3-rc3#n10021

> btrfs_put_block_group(block_group);
> continue;
> }
> --
> 1.8.5.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: fix a compiler warning of may be used uninitialized

2015-09-30 Thread Holger Hoffstätte
On 09/30/15 05:55, Zhao Lei wrote:
>> count is defined iff add_to_ctl == true, so the patch is not necessary. And 
>> I'm
>> not quite sure that 0 passed down to __btrfs_add_free_space as 'bytes' makes
>> sense at all.
> 
> Agree above all.
> 
> So I write following description in changelog:
>   "Not real problem, just avoid warning of: ..."
> 
> It is just to avoid complier warning, no function changed.
> A warning in compiler output is not pretty:)

This looks more like a false-positive with gcc 4.8.3.
With 5.2:

..
  CC [M]  fs/btrfs/file-item.o
  CC [M]  fs/btrfs/inode-item.o
  CC [M]  fs/btrfs/inode-map.o
  CC [M]  fs/btrfs/disk-io.o
  CC [M]  fs/btrfs/transaction.o
..

No warning, as expected.

-h

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/2] btrfs: Fix lost-data-profile caused by balance bg

2015-09-30 Thread Zhao Lei
Hi, Filipe Manana

> -Original Message-
> From: Filipe Manana [mailto:fdman...@gmail.com]
> Sent: Wednesday, September 30, 2015 3:41 PM
> To: Zhao Lei 
> Cc: linux-btrfs@vger.kernel.org
> Subject: Re: [PATCH 2/2] btrfs: Fix lost-data-profile caused by balance bg
> 
> On Wed, Sep 30, 2015 at 5:20 AM, Zhao Lei  wrote:
> > Hi, Filipe Manana
> >
> > Thanks for reviewing.
> >
> >> -Original Message-
> >> From: Filipe Manana [mailto:fdman...@gmail.com]
> >> Sent: Tuesday, September 29, 2015 11:48 PM
> >> To: Zhao Lei 
> >> Cc: linux-btrfs@vger.kernel.org
> >> Subject: Re: [PATCH 2/2] btrfs: Fix lost-data-profile caused by
> >> balance bg
> >>
> >> On Tue, Sep 29, 2015 at 2:51 PM, Zhao Lei  wrote:
> >> > Reproduce:
> >> >  (In integration-4.3 branch)
> >> >
> >> >  TEST_DEV=(/dev/vdg /dev/vdh)
> >> >  TEST_DIR=/mnt/tmp
> >> >
> >> >  umount "$TEST_DEV" >/dev/null
> >> >  mkfs.btrfs -f -d raid1 "${TEST_DEV[@]}"
> >> >
> >> >  mount -o nospace_cache "$TEST_DEV" "$TEST_DIR"
> >> >  btrfs balance start -dusage=0 $TEST_DIR  btrfs filesystem usage
> >> > $TEST_DIR
> >> >
> >> >  dd if=/dev/zero of="$TEST_DIR"/file count=100  btrfs filesystem
> >> > usage $TEST_DIR
> >> >
> >> > Result:
> >> >  We can see "no data chunk" in first "btrfs filesystem usage":
> >> >  # btrfs filesystem usage $TEST_DIR
> >> >  Overall:
> >> > ...
> >> >  Metadata,single: Size:8.00MiB, Used:0.00B
> >> > /dev/vdg8.00MiB
> >> >  Metadata,RAID1: Size:122.88MiB, Used:112.00KiB
> >> > /dev/vdg  122.88MiB
> >> > /dev/vdh  122.88MiB
> >> >  System,single: Size:4.00MiB, Used:0.00B
> >> > /dev/vdg4.00MiB
> >> >  System,RAID1: Size:8.00MiB, Used:16.00KiB
> >> > /dev/vdg8.00MiB
> >> > /dev/vdh8.00MiB
> >> >  Unallocated:
> >> > /dev/vdg1.06GiB
> >> > /dev/vdh1.07GiB
> >> >
> >> >  And "data chunks changed from raid1 to single" in second  "btrfs
> >> > filesystem usage":
> >> >  # btrfs filesystem usage $TEST_DIR
> >> >  Overall:
> >> > ...
> >> >  Data,single: Size:256.00MiB, Used:0.00B
> >> > /dev/vdh  256.00MiB
> >> >  Metadata,single: Size:8.00MiB, Used:0.00B
> >> > /dev/vdg8.00MiB
> >> >  Metadata,RAID1: Size:122.88MiB, Used:112.00KiB
> >> > /dev/vdg  122.88MiB
> >> > /dev/vdh  122.88MiB
> >> >  System,single: Size:4.00MiB, Used:0.00B
> >> > /dev/vdg4.00MiB
> >> >  System,RAID1: Size:8.00MiB, Used:16.00KiB
> >> > /dev/vdg8.00MiB
> >> > /dev/vdh8.00MiB
> >> >  Unallocated:
> >> > /dev/vdg1.06GiB
> >> > /dev/vdh  841.92MiB
> >> >
> >> > Reason:
> >> >  btrfs balance delete last data chunk in case of no data in  the
> >> > filesystem, then we can see "no data chunk" by "fi usage"
> >> >  command.
> >> >
> >> >  And when we do write operation to fs, the only available data
> >> > profile is 0x0, result is all new chunks are allocated single type.
> >> >
> >> > Fix:
> >> >  Allocate a data chunk explicitly in balance operation, to reserve
> >> > at least one data chunk in the filesystem.
> >>
> >> Allocate a data chunk explicitly to ensure we don't lose the raid profile 
> >> for
> data.
> >>
> >
> > Thanks, will change in v2.
> >
> >> >
> >> > Test:
> >> >  Test by above script, and confirmed the logic by debug output.
> >> >
> >> > Signed-off-by: Zhao Lei 
> >> > ---
> >> >  fs/btrfs/volumes.c | 19 +++
> >> >  1 file changed, 19 insertions(+)
> >> >
> >> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index
> >> > 6fc73586..3d5e41e 100644
> >> > --- a/fs/btrfs/volumes.c
> >> > +++ b/fs/btrfs/volumes.c
> >> > @@ -3277,6 +3277,7 @@ static int __btrfs_balance(struct
> >> > btrfs_fs_info
> >> *fs_info)
> >> > u64 limit_data = bctl->data.limit;
> >> > u64 limit_meta = bctl->meta.limit;
> >> > u64 limit_sys = bctl->sys.limit;
> >> > +   int chunk_reserved = 0;
> >> >
> >> > /* step one make some room on all the devices */
> >> > devices = _info->fs_devices->devices; @@ -3387,6
> >> > +3388,24 @@ again:
> >> > goto loop;
> >> > }
> >> >
> >> > +   if (!chunk_reserved) {
> >> > +   trans = btrfs_start_transaction(chunk_root,
> 0);
> >> > +   if (IS_ERR(trans)) {
> >> > +
> >> mutex_unlock(_info->delete_unused_bgs_mutex);
> >> > +   ret = PTR_ERR(trans);
> >> > +   goto error;
> >> > +   }
> >> > +
> >> > +   ret = btrfs_force_chunk_alloc(trans,
> >> > + chunk_root, 1);
> >>
> >> Can we please use the symbol BTRFS_BLOCK_GROUP_DATA instead of 1?
> >>
> > Thanks, will change in v2.
> >
> >
> >> > +   if (ret < 0) {
> >> > +
> >> mutex_unlock(_info->delete_unused_bgs_mutex);

[PATCH] fstests: btrfs: Check if fallocate re-truncates page beyond EOF

2015-09-30 Thread Qu Wenruo
Even the fallocate range doesn't cover the last page of a file, btrfs
will still re-truncate the last page.

Such behavior is completely duplicated and unneeded, and fixed by the
following kernel patch:

btrfs: Avoid truncate tailing page if fallocate range doesn't exceed
inode size

Add this test case to check that malfunction.

Signed-off-by: Qu Wenruo 
---
 tests/btrfs/104 | 83 +
 tests/btrfs/104.out |  3 ++
 tests/btrfs/group   |  1 +
 3 files changed, 87 insertions(+)
 create mode 100755 tests/btrfs/104
 create mode 100644 tests/btrfs/104.out

diff --git a/tests/btrfs/104 b/tests/btrfs/104
new file mode 100755
index 000..f3ddc15
--- /dev/null
+++ b/tests/btrfs/104
@@ -0,0 +1,83 @@
+#! /bin/bash
+# FS QA Test 104
+#
+# Test that calling fallocate against a range which is already
+# allocated does not truncate beyond EOF
+#
+#---
+# Copyright (c) 2015 Fujitsu. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/defrag
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_need_to_be_root
+
+# Use 64K file size to match any sectorsize
+# And with a unaligned tailing range to ensure it will be at least 2 pages
+filesize=$(( 64 * 1024 + 1024 ))
+
+# Fallocate a range that will not cover the tailing page
+fallocrange=$(( 64 * 1024 ))
+
+_scratch_mkfs > /dev/null 2>&1
+_scratch_mount
+$XFS_IO_PROG -f -c "pwrite 0 $filesize" $SCRATCH_MNT/foo | _filter_xfs_io
+sync
+orig_extent_nr=`_extent_count $SCRATCH_MNT/foo`
+
+# As all space are allocated and even written to disk, this falloc
+# should do nothing with extent modification.
+$XFS_IO_PROG -f -c "falloc 0 $fallocrange" $SCRATCH_MNT/foo
+sync
+new_extent_nr=`_extent_count $SCRATCH_MNT/foo`
+
+echo "orig: $orig_extent_nr, new: $new_extent_nr" >> $seqres.full
+
+if [ "x$orig_extent_nr" != "x$new_extent_nr" ]; then
+   echo "Extent beyond EOF is re-truncated"
+   echo "orig_extent_nr: $orig_extent_nr new_extent_nr: $new_extent_nr"
+fi
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/104.out b/tests/btrfs/104.out
new file mode 100644
index 000..4c43e17
--- /dev/null
+++ b/tests/btrfs/104.out
@@ -0,0 +1,3 @@
+QA output created by 104
+wrote 66560/66560 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/btrfs/group b/tests/btrfs/group
index e92a65a..640336b 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -106,3 +106,4 @@
 101 auto quick replace
 102 auto quick metadata enospc
 103 auto quick clone compress
+104 auto quick prealloc
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: lots of snapshots, forcing defragmentation, and figuring out which files to defrag or nodatacow

2015-09-30 Thread Duncan
James Cook posted on Mon, 28 Sep 2015 22:51:05 -0700 as excerpted:

> The context of these three questions is that I'm experiencing occasional
> hangs for several seconds while btrfs-transacti works, and very long
> boot times. General comments welcome. System info at bottom,
> end part of dmesg.log attached.
> 
> Q1:
> 
> I keep a lot of read-only snapshots (> 300 total) of all of my
> subvolumes and haven't deleted any so far. Is this in itself a problem
> or unanticipated use of btrfs?

Very large numbers of snapshots do slow things down, but ~300 isn't what 
I'd call "very large" -- we're talking thousands to tens of thousands.

My general recommendation is to keep it to ~250ish (under 300) per 
snapshotted subvolume, preferably under 2000 (and if possible 1000) 
total, easy enough to do even with automated frequent snapshotting (on 
the order of an hour apart, initially), as long as an equally automated 
snapshot thinning program is also established.  At ~250 per subvolume, 
1000 is 4 subvolumes worth, 2000 8 subvolumes worth.

A bit over 300, assuming they're all of the same subvolume, is getting a 
bit high, but it shouldn't be causing a lot of trouble yet.  It's just 
time to start thinking about a thinning program.

There's one exception, quotas.  Quotas continue to be an issue on btrfs; 
they're on their third rewrite now and while they believe it will work 
this time, there's still some serious bugs that will take a couple more 
kernels to work out.  And besides not working right, they dramatically 
increase scalability issues.  So my recommendation, unless you're 
directly working with the devs to test, report problems with, and bug-
trace various quota issues, just don't run them on btrfs at this time.  
If you need quotas, use a filesystem where they're mature and work.  If 
you don't, use btrfs without them.

Really.  I've seen at least two confirmed cases posted where people 
running quotas turned them off and their scaling issues disappeared.
So if you have them on, that could well be your problem, right there.

> Q2:
> 
> I have some files that remain heavily fragmented (according to filefrag)
> even after I run btrfs fi defragment. I think this happens because btrfs
> doesn't want to unlink parts of the files from their snapshotted copies.
> Can I tell btrfs to defragment them anyway, and not worry about wasting
> space? And can I make the autodefrag mount option do this?
> 
> For example (not all output shown):
> 
> # filefrag *
> ...
> 
system@1973a03e3af1449ba5dd93362953fd5f-0001-00051f9377f11af6.journal:
> 553 extents found ...
> 
> # btrfs fi defragment -rf .
> 
> # filefrag *
> ...
> 
system@1973a03e3af1449ba5dd93362953fd5f-0001-00051f9377f11af6.journal:
> 331 extents found ...

Several points to note, here:

1) Filefrag doesn't understand btrfs compression.

If you don't use btrfs compression, this doesn't apply, but for btrfs 
compressed files, filefrag reports huge numbers of extents -- generally 
one per btrfs compression block (128 KiB), so 8 per MiB, 8192 per GiB of 
(before compression, not like btrfs give you a way to see post-
compression file size anyway) file size.

But unless you run compress-force you won't see it everywhere, because 
btrfs only compresses some files.

2) Btrfs defrag isn't snapshot aware, and will only defrag the files it's 
directly pointed at, using more space as it breaks the reflinks to the 
snapshotted copy.  Around 3.9 snapshot-aware defrag was introduced, but 
it turned out to have *severe* scalability issues, so that was rolled 
back and snapshot-aware defrag was turned off again in, IIRC, 3.12 (thus 
well before what you're running).

So worrying about breaking snapshot reflinks while defragging isn't going 
to be your problem, that, per se, is simply not an issue.

3) What /can/ be an issue is dealt with using defrag's -t parameter.  I 
don't remember what the default target extent size is, but it's somewhat 
smaller than you might expect, well under a gig.  Extent sizes larger 
than this are considered to be already defragged and aren't touched.  
(While this does touch on #2 above as well, not unnecessarily breaking 
reflinks to extents shared with other snapshots, the mechanism is one of 
extent size, not whether the extent is shared with another snapshot.  So 
even if it's a new file not yet snapshotted, extents over this size won't 
be touched.)

It's worth keeping in mind that btrfs' nominal data chunk size is 1 GiB.  
As such, that's the nominal largest extent size as well, altho in some 
cases (data chunks created on nearly empty TiB-scale filesystems) data 
chunk size can be larger, multiple GiB, in which case extent size can be 
larger as well.

Regardless, extent sizes > 1 GiB really aren't going to be a performance 
issue anyway, so while using the -t 1G or -t 2G option is a good idea and 
should reduce fragmentation further for extents between the default size 
and your -t size, going above that isn't 

Re: [PATCH 2/2] btrfs: Fix lost-data-profile caused by balance bg

2015-09-30 Thread Filipe Manana
On Wed, Sep 30, 2015 at 5:20 AM, Zhao Lei  wrote:
> Hi, Filipe Manana
>
> Thanks for reviewing.
>
>> -Original Message-
>> From: Filipe Manana [mailto:fdman...@gmail.com]
>> Sent: Tuesday, September 29, 2015 11:48 PM
>> To: Zhao Lei 
>> Cc: linux-btrfs@vger.kernel.org
>> Subject: Re: [PATCH 2/2] btrfs: Fix lost-data-profile caused by balance bg
>>
>> On Tue, Sep 29, 2015 at 2:51 PM, Zhao Lei  wrote:
>> > Reproduce:
>> >  (In integration-4.3 branch)
>> >
>> >  TEST_DEV=(/dev/vdg /dev/vdh)
>> >  TEST_DIR=/mnt/tmp
>> >
>> >  umount "$TEST_DEV" >/dev/null
>> >  mkfs.btrfs -f -d raid1 "${TEST_DEV[@]}"
>> >
>> >  mount -o nospace_cache "$TEST_DEV" "$TEST_DIR"
>> >  btrfs balance start -dusage=0 $TEST_DIR  btrfs filesystem usage
>> > $TEST_DIR
>> >
>> >  dd if=/dev/zero of="$TEST_DIR"/file count=100  btrfs filesystem usage
>> > $TEST_DIR
>> >
>> > Result:
>> >  We can see "no data chunk" in first "btrfs filesystem usage":
>> >  # btrfs filesystem usage $TEST_DIR
>> >  Overall:
>> > ...
>> >  Metadata,single: Size:8.00MiB, Used:0.00B
>> > /dev/vdg8.00MiB
>> >  Metadata,RAID1: Size:122.88MiB, Used:112.00KiB
>> > /dev/vdg  122.88MiB
>> > /dev/vdh  122.88MiB
>> >  System,single: Size:4.00MiB, Used:0.00B
>> > /dev/vdg4.00MiB
>> >  System,RAID1: Size:8.00MiB, Used:16.00KiB
>> > /dev/vdg8.00MiB
>> > /dev/vdh8.00MiB
>> >  Unallocated:
>> > /dev/vdg1.06GiB
>> > /dev/vdh1.07GiB
>> >
>> >  And "data chunks changed from raid1 to single" in second  "btrfs
>> > filesystem usage":
>> >  # btrfs filesystem usage $TEST_DIR
>> >  Overall:
>> > ...
>> >  Data,single: Size:256.00MiB, Used:0.00B
>> > /dev/vdh  256.00MiB
>> >  Metadata,single: Size:8.00MiB, Used:0.00B
>> > /dev/vdg8.00MiB
>> >  Metadata,RAID1: Size:122.88MiB, Used:112.00KiB
>> > /dev/vdg  122.88MiB
>> > /dev/vdh  122.88MiB
>> >  System,single: Size:4.00MiB, Used:0.00B
>> > /dev/vdg4.00MiB
>> >  System,RAID1: Size:8.00MiB, Used:16.00KiB
>> > /dev/vdg8.00MiB
>> > /dev/vdh8.00MiB
>> >  Unallocated:
>> > /dev/vdg1.06GiB
>> > /dev/vdh  841.92MiB
>> >
>> > Reason:
>> >  btrfs balance delete last data chunk in case of no data in  the
>> > filesystem, then we can see "no data chunk" by "fi usage"
>> >  command.
>> >
>> >  And when we do write operation to fs, the only available data
>> > profile is 0x0, result is all new chunks are allocated single type.
>> >
>> > Fix:
>> >  Allocate a data chunk explicitly in balance operation, to reserve  at
>> > least one data chunk in the filesystem.
>>
>> Allocate a data chunk explicitly to ensure we don't lose the raid profile 
>> for data.
>>
>
> Thanks, will change in v2.
>
>> >
>> > Test:
>> >  Test by above script, and confirmed the logic by debug output.
>> >
>> > Signed-off-by: Zhao Lei 
>> > ---
>> >  fs/btrfs/volumes.c | 19 +++
>> >  1 file changed, 19 insertions(+)
>> >
>> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index
>> > 6fc73586..3d5e41e 100644
>> > --- a/fs/btrfs/volumes.c
>> > +++ b/fs/btrfs/volumes.c
>> > @@ -3277,6 +3277,7 @@ static int __btrfs_balance(struct btrfs_fs_info
>> *fs_info)
>> > u64 limit_data = bctl->data.limit;
>> > u64 limit_meta = bctl->meta.limit;
>> > u64 limit_sys = bctl->sys.limit;
>> > +   int chunk_reserved = 0;
>> >
>> > /* step one make some room on all the devices */
>> > devices = _info->fs_devices->devices; @@ -3387,6 +3388,24
>> > @@ again:
>> > goto loop;
>> > }
>> >
>> > +   if (!chunk_reserved) {
>> > +   trans = btrfs_start_transaction(chunk_root, 0);
>> > +   if (IS_ERR(trans)) {
>> > +
>> mutex_unlock(_info->delete_unused_bgs_mutex);
>> > +   ret = PTR_ERR(trans);
>> > +   goto error;
>> > +   }
>> > +
>> > +   ret = btrfs_force_chunk_alloc(trans,
>> > + chunk_root, 1);
>>
>> Can we please use the symbol BTRFS_BLOCK_GROUP_DATA instead of 1?
>>
> Thanks, will change in v2.
>
>
>> > +   if (ret < 0) {
>> > +
>> mutex_unlock(_info->delete_unused_bgs_mutex);
>> > +   goto error;
>> > +   }
>> > +
>> > +   btrfs_end_transaction(trans, chunk_root);
>> > +   chunk_reserved = 1;
>> > +   }
>>
>> Can we do this logic only if the block group is a data one? I.e. no point 
>> allocating
>> a data block group if our balance only touches metadata ones (due to some
>> filter for e.g.).
>>
> Thanks for point out it, will change in v2.

I find it somewhat awkward that we always allocate a new data block
group no matter what. Some balance 

Re: [PATCH v4 0/9] Btrfs: free space B-tree

2015-09-30 Thread Omar Sandoval
On Tue, Sep 29, 2015 at 08:50:29PM -0700, Omar Sandoval wrote:
> Hi,
> 
> Here's one more reroll of the free space B-tree patches, a more scalable
> alternative to the free space cache. Minimal changes this time around, I
> mainly wanted to resend this after Holger and I cleared up his bug
> report here: http://www.spinics.net/lists/linux-btrfs/msg47165.html. It
> initially looked like it was a bug in a patch that Josef sent, then in
> this series, but finally Holger and I figured out that it was something
> else in the queue of patches he carries around, we just don't know what
> yet (I'm in the middle of looking into it).

Okay, I tracked down Holger's bug to a bad merge in his patch queue, so
we're off the hook.

> While trying to reproduce
> that bug, I ran xfstests about a trillion times and a bunch of stress
> tests, so this is fairly well tested now. Additionally, the last time
> around, Holger and Austin both bravely offered their Tested-bys on the
> series. I wasn't sure which patch(es) to tack them onto so here they
> are:
> 
> Tested-by: Holger Hoffstätte 
> Tested-by: Austin S. Hemmelgarn 
> 
> Thanks, everyone!
> 
> Omar
> 
> Changes from v3->v4:
> 
> - Added a missing btrfs_end_transaction() to btrfs_create_free_space_tree() 
> and
>   btrfs_clear_free_space_tree() in the error cases after we abort the
>   transaction (see http://www.spinics.net/lists/linux-btrfs/msg47545.html)
> - Rebased the kernel patches on v4.3-rc3
> - Rebased the progs patches on v4.2.1
> 
> v3: http://www.spinics.net/lists/linux-btrfs/msg47095.html
> 
> Changes from v2->v3:
> 
> - Fixed a warning in the free space tree sanity tests caught by Zhao Lei.
> - Moved the addition of a block group to the free space tree to occur either 
> on
>   the first attempt to modify the free space for the block group or in
>   btrfs_create_pending_block_groups(), whichever happens first. This avoids a
>   deadlock (lock recursion) when modifying the free space tree requires
>   allocating a new block group. In order to do this, it was simpler to change
>   the on-disk semantics: the superblock stripes will now appear to be free 
> space
>   according to the free space tree, but load_free_space_tree() will still
>   exclude them when building the in-memory free space cache.
> - Changed the free_space_tree option to space_cache=v2 and made clear_cache
>   clear the free space tree. If the free space tree has been created,
>   the mount will fail unless space_cache=v2 or nospace_cache,clear_cache
>   is given because we cannot allow the free space tree to get out of
>   date.
> - Did a once-over of the code and caught a couple of error handling typos.
> 
> v2: http://www.spinics.net/lists/linux-btrfs/msg46796.html
> 
> Changes from v1->v2:
> 
> - Cleaned up a bunch of unnecessary instances of "if (ret) goto out; ret = 0"
> - Added aborts in the free space tree code closer to the site the error is
>   encountered: where we add or remove block groups, add or remove free space,
>   and also when we convert formats
> - Moved loading of the free space tree into caching_thread() and added a new
>   patch 3 in preparation for it
> - Commented a bunch of stuff in the extent buffer bitmap operations and
>   refactored some of the complicated logic
> 
> v1: http://www.spinics.net/lists/linux-btrfs/msg46713.html
> 
> Omar Sandoval (9):
>   Btrfs: add extent buffer bitmap operations
>   Btrfs: add extent buffer bitmap sanity tests
>   Btrfs: add helpers for read-only compat bits
>   Btrfs: refactor caching_thread()
>   Btrfs: introduce the free space B-tree on-disk format
>   Btrfs: implement the free space B-tree
>   Btrfs: add free space tree sanity tests
>   Btrfs: wire up the free space tree to the extent tree
>   Btrfs: add free space tree mount option
> 
>  fs/btrfs/Makefile  |5 +-
>  fs/btrfs/ctree.h   |  157 +++-
>  fs/btrfs/disk-io.c |   38 +
>  fs/btrfs/extent-tree.c |   98 +-
>  fs/btrfs/extent_io.c   |  183 +++-
>  fs/btrfs/extent_io.h   |   10 +-
>  fs/btrfs/free-space-tree.c | 1584 
> 
>  fs/btrfs/free-space-tree.h |   72 ++
>  fs/btrfs/super.c   |   56 +-
>  fs/btrfs/tests/btrfs-tests.c   |   52 ++
>  fs/btrfs/tests/btrfs-tests.h   |   10 +
>  fs/btrfs/tests/extent-io-tests.c   |  138 ++-
>  fs/btrfs/tests/free-space-tests.c  |   35 +-
>  fs/btrfs/tests/free-space-tree-tests.c |  571 
>  fs/btrfs/tests/qgroup-tests.c  |   20 +-
>  include/trace/events/btrfs.h   |3 +-
>  16 files changed, 2925 insertions(+), 107 deletions(-)
>  create mode 100644 fs/btrfs/free-space-tree.c
>  create mode 100644 fs/btrfs/free-space-tree.h
>  create mode 100644 fs/btrfs/tests/free-space-tree-tests.c
> 
> -- 
> 2.6.0
> 

-- 
Omar
--
To unsubscribe from this 

[RFC PATCH] fstests: generic: Test that fsync works on file in overlayfs merged directory

2015-09-30 Thread Roman Lebedev
As per overlayfs documentation, any activity on a merged directory
for a application that is doing such activity should work exactly
as if that would be a normal, non overlayfs-merged directory.

That is, e.g. simple fopen-fwrite-fsync-fclose sequence should
work just fine.

But apparently it does not. Add a simple generic test to check that.
As of right now (linux-4.2.1) this test fails at least on btrfs.

PS: An alternative (and probably better approach) would be to run
fstests test suite with TEST_DIR set to overlayfs work directory.

Also, i'm not sure that this test fits here, but it's my best guess.

Signed-off-by: Roman Lebedev 
---
 tests/generic/111 | 80 +++
 tests/generic/111.out |  5 
 tests/generic/group   |  1 +
 3 files changed, 86 insertions(+)
 create mode 100755 tests/generic/111
 create mode 100644 tests/generic/111.out

diff --git a/tests/generic/111 b/tests/generic/111
new file mode 100755
index 000..3c2599b
--- /dev/null
+++ b/tests/generic/111
@@ -0,0 +1,80 @@
+#! /bin/bash
+# FS QA Test 111
+#
+# Test that fsync works on file in overlayfs merged directory
+#
+#---
+# Copyright (c) 2015 Roman Lebedev.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+lower=$TEST_DIR/lower.$$
+upper=$TEST_DIR/upper.$$
+work=$TEST_DIR/work.$$
+merged=$TEST_DIR/merged.$$
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+   umount $merged
+   rm -rf $merged
+   rm -rf $work
+   rm -rf $upper
+   rm -rf $lower
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+
+_supported_fs generic
+_supported_os IRIX Linux
+_require_test
+
+mkdir $lower
+
+$XFS_IO_PROG -f -c "pwrite 0 4k" -c "fsync" \
+   $lower/file | _filter_xfs_io
+
+mkdir $upper
+mkdir $work
+mkdir $merged
+
+sync
+
+mount -t overlay overlay -olowerdir=$lower \
+   -oupperdir=$upper -oworkdir=$work $merged
+
+$XFS_IO_PROG -f -c "pwrite 0 4k" -c "fsync" \
+   $merged/file | _filter_xfs_io
+
+# if we are here, then fsync did not crash, so we're good.
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/111.out b/tests/generic/111.out
new file mode 100644
index 000..36c7fde
--- /dev/null
+++ b/tests/generic/111.out
@@ -0,0 +1,5 @@
+QA output created by 111
+wrote 4096/4096 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 4096/4096 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/generic/group b/tests/generic/group
index 4ae256f..d3516f9 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -112,6 +112,7 @@
 107 auto quick metadata
 108 auto quick rw
 109 auto metadata dir
+111 auto quick
 112 rw aio auto quick
 113 rw aio auto quick
 117 attr auto quick
-- 
2.6.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kernel BUG when fsync'ing file in a overlayfs merged dir, located on btrfs

2015-09-30 Thread Roman Lebedev
Hello.

My / is btrfs.
To do some my local stuff more cleanly i wanted to use overlayfs, 
but it didn't quite work.

Simple non-automatic sequence to reproduce the issue:
 mkdir lower upper work merged
 mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged
 vi merged/file
 :wq

Results in vi being killed on exit, and the following trace appears in dmesg:

[34304.047841] BUG: unable to handle kernel paging request at 09618e56
[34304.047846] IP: [] btrfs_sync_file+0xa6/0x350 [btrfs]
[34304.047864] PGD 0 
[34304.047866] Oops: 0002 [#12] SMP 
[34304.047867] Modules linked in: overlay cpufreq_userspace cpufreq_stats 
cpufreq_powersave cpufreq_conservative binfmt_misc nfsd auth_rpcgss 
oid_registry nfs_acl nfs lockd grace fscache sunrpc fglrx(PO) nls_utf8 joydev 
nls_cp437 vfat fat hid_generic usbhid kvm_amd hid kvm crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic 
snd_hda_codec_hdmi sha256_ssse3 sha256_generic snd_hda_intel snd_hda_codec hmac 
drbg ansi_cprng aesni_intel snd_hda_core aes_x86_64 mxm_wmi snd_hwdep lrw 
eeepc_wmi snd_pcm gf128mul asus_wmi sparse_keymap rfkill video snd_timer 
glue_helper sp5100_tco evdev ablk_helper e1000e ohci_pci pcspkr snd ohci_hcd 
xhci_pci edac_mce_amd ehci_pci serio_raw xhci_hcd soundcore fam15h_power 
ehci_hcd cryptd edac_core ptp pps_core usbcore k10temp i2c_piix4
[34304.047893]  sg usb_common acpi_cpufreq wmi tpm_infineon button processor 
shpchp tpm_tis tpm thermal_sys tcp_yeah tcp_vegas it87 hwmon_vid loop 
parport_pc ppdev lp parport autofs4 crc32c_generic btrfs xor raid6_pq sd_mod 
crc32c_intel ahci libahci libata scsi_mod
[34304.047905] CPU: 4 PID: 13990 Comm: vi Tainted: P  DO
4.2.0-1-amd64 #1 Debian 4.2.1-2
[34304.047906] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./CROSSHAIR V FORMULA-Z, BIOS 2201 03/23/2015
[34304.047908] task: 8803d5f7f2c0 ti: 8806a3ec8000 task.ti: 
8806a3ec8000
[34304.047909] RIP: 0010:[]  [] 
btrfs_sync_file+0xa6/0x350 [btrfs]
[34304.047920] RSP: 0018:8806a3ecbe88  EFLAGS: 00010246
[34304.047921] RAX: 8803d5f7f2c0 RBX: 8807b2d46600 RCX: 81a6ad00
[34304.047922] RDX: 8000 RSI:  RDI: 8807c19f8970
[34304.047923] RBP: 8807c19f8970 R08:  R09: 0001
[34304.047924] R10:  R11: 0246 R12: 8807c19f88c8
[34304.047925] R13:  R14: 09618b22 R15: 55cb20184a70
[34304.047926] FS:  7f31c5492800() GS:88082fd0() 
knlGS:
[34304.047927] CS:  0010 DS:  ES:  CR0: 80050033
[34304.047928] CR2: 09618e56 CR3: 00044af44000 CR4: 000406e0
[34304.047929] Stack:
[34304.047930]  0001 7fff 880403d5b918 
8000
[34304.047932]    55cb20186d40 
8807b2d46600
[34304.047933]  0004 88044b249000 0020 
8807b2d46600
[34304.047935] Call Trace:
[34304.047939]  [] ? do_fsync+0x38/0x60
[34304.047940]  [] ? SyS_fsync+0x10/0x20
[34304.047943]  [] ? system_call_fast_compare_end+0xc/0x6b
[34304.047944] Code: 49 8b 0f 48 85 c9 75 e9 eb b3 48 8b 44 24 08 49 8d ac 24 
a8 00 00 00 48 89 ef 4c 29 e8 48 83 c0 01 48 89 44 24 18 e8 3a 59 3e e1  41 
ff 86 34 03 00 00 49 8b 84 24 70 ff ff ff 48 c1 e8 07 83 
[34304.047959] RIP  [] btrfs_sync_file+0xa6/0x350 [btrfs]
[34304.047970]  RSP 
[34304.047970] CR2: 09618e56
[34304.047972] ---[ end trace 414199893a542949 ]---

I was able to create a new fstests test that reproduces my issue,
and i'm sending it as follow-up to this message.

Roman Lebedev (1):
  fstests: generic: Test that fsync works on file in overlayfs merged
directory

 tests/generic/111 | 80 +++
 tests/generic/111.out |  5 
 tests/generic/group   |  1 +
 3 files changed, 86 insertions(+)
 create mode 100755 tests/generic/111
 create mode 100644 tests/generic/111.out

-- 
2.6.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/2] btrfs: Fix lost-data-profile caused by auto removing bg

2015-09-30 Thread Zhao Lei
Hi, Filipe Manana

> -Original Message-
> From: linux-btrfs-ow...@vger.kernel.org
> [mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Filipe Manana
> Sent: Wednesday, September 30, 2015 3:43 PM
> To: Zhao Lei 
> Cc: linux-btrfs@vger.kernel.org
> Subject: Re: [PATCH 1/2] btrfs: Fix lost-data-profile caused by auto removing 
> bg
> 
> On Tue, Sep 29, 2015 at 2:51 PM, Zhao Lei  wrote:
> > Reproduce:
> >  (In integration-4.3 branch)
> >
> >  TEST_DEV=(/dev/vdg /dev/vdh)
> >  TEST_DIR=/mnt/tmp
> >
> >  umount "$TEST_DEV" >/dev/null
> >  mkfs.btrfs -f -d raid1 "${TEST_DEV[@]}"
> >
> >  mount -o nospace_cache "$TEST_DEV" "$TEST_DIR"
> >  umount "$TEST_DEV"
> >
> >  mount -o nospace_cache "$TEST_DEV" "$TEST_DIR"
> >  btrfs filesystem usage $TEST_DIR
> >
> > We can see the data chunk changed from raid1 to single:
> >  # btrfs filesystem usage $TEST_DIR
> >  Data,single: Size:8.00MiB, Used:0.00B
> > /dev/vdg8.00MiB
> >  #
> >
> > Reason:
> >  When a empty filesystem mount with -o nospace_cache, the last  data
> > blockgroup will be auto-removed in umount.
> >
> >  Then if we mount it again, there is no data chunk in the  filesystem,
> > so the only available data profile is 0x0, result  is all new chunks
> > are created as single type.
> >
> > Fix:
> >  Don't auto-delete last blockgroup for a raid type.
> >
> > Test:
> >  Test by above script, and confirmed the logic by debug output.
> >
> > Signed-off-by: Zhao Lei 
> > ---
> >  fs/btrfs/extent-tree.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index
> > 79a5bd9..3505649 100644
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -10012,7 +10012,8 @@ void btrfs_delete_unused_bgs(struct
> btrfs_fs_info *fs_info)
> >bg_list);
> > space_info = block_group->space_info;
> > list_del_init(_group->bg_list);
> > -   if (ret || btrfs_mixed_space_info(space_info)) {
> > +   if (ret || btrfs_mixed_space_info(space_info) ||
> > +   block_group->list.next == block_group->list.prev)
> > + {
> 
> This isn't race free. The list block_group->list is protected by the 
> groups_sem
> semaphore. Need to take before doing this check.

Thanks for pointing out this.

> You can do that in the "if" statement below this one, where we're holding
> _info->groups_sem [1]
> 
It is hard to do check in btrfs_remove_block_group(), because it is common
function used by both balance and auto-remove bg.

For balance operation, we can remove lattest bg in some case, or we need
add additional argument to separate these two operation(complex).

So I decided to take groups_sem semaphore in place of checking.
Thanks for notice this lock problem.

btw, could I add your signed-off-by or reviewed-by in patch 2/2?

Thanks
Zhaolei

> thanks
> 
> [1]
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/exte
> nt-tree.c?id=refs/tags/v4.3-rc3#n10021
> 
> > btrfs_put_block_group(block_group);
> > continue;
> > }
> > --
> > 1.8.5.1
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
> > in the body of a message to majord...@vger.kernel.org More majordomo
> > info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> --
> Filipe David Manana,
> 
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the 
> body
> of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] btrfs: Fix lost-data-profile caused by balance bg

2015-09-30 Thread Zhao Lei
Reproduce:
 (In integration-4.3 branch)

 TEST_DEV=(/dev/vdg /dev/vdh)
 TEST_DIR=/mnt/tmp

 umount "$TEST_DEV" >/dev/null
 mkfs.btrfs -f -d raid1 "${TEST_DEV[@]}"

 mount -o nospace_cache "$TEST_DEV" "$TEST_DIR"
 btrfs balance start -dusage=0 $TEST_DIR
 btrfs filesystem usage $TEST_DIR

 dd if=/dev/zero of="$TEST_DIR"/file count=100
 btrfs filesystem usage $TEST_DIR

Result:
 We can see "no data chunk" in first "btrfs filesystem usage":
 # btrfs filesystem usage $TEST_DIR
 Overall:
...
 Metadata,single: Size:8.00MiB, Used:0.00B
/dev/vdg8.00MiB
 Metadata,RAID1: Size:122.88MiB, Used:112.00KiB
/dev/vdg  122.88MiB
/dev/vdh  122.88MiB
 System,single: Size:4.00MiB, Used:0.00B
/dev/vdg4.00MiB
 System,RAID1: Size:8.00MiB, Used:16.00KiB
/dev/vdg8.00MiB
/dev/vdh8.00MiB
 Unallocated:
/dev/vdg1.06GiB
/dev/vdh1.07GiB

 And "data chunks changed from raid1 to single" in second
 "btrfs filesystem usage":
 # btrfs filesystem usage $TEST_DIR
 Overall:
...
 Data,single: Size:256.00MiB, Used:0.00B
/dev/vdh  256.00MiB
 Metadata,single: Size:8.00MiB, Used:0.00B
/dev/vdg8.00MiB
 Metadata,RAID1: Size:122.88MiB, Used:112.00KiB
/dev/vdg  122.88MiB
/dev/vdh  122.88MiB
 System,single: Size:4.00MiB, Used:0.00B
/dev/vdg4.00MiB
 System,RAID1: Size:8.00MiB, Used:16.00KiB
/dev/vdg8.00MiB
/dev/vdh8.00MiB
 Unallocated:
/dev/vdg1.06GiB
/dev/vdh  841.92MiB

Reason:
 btrfs balance delete last data chunk in case of no data in
 the filesystem, then we can see "no data chunk" by "fi usage"
 command.

 And when we do write operation to fs, the only available data
 profile is 0x0, result is all new chunks are allocated single type.

Fix:
 Allocate a data chunk explicitly to ensure we don't lose the
 raid profile for data.

Test:
 Test by above script, and confirmed the logic by debug output.

Changelog v1->v2:
1: Update patch description of "Fix" field
2: Use BTRFS_BLOCK_GROUP_DATA for btrfs_force_chunk_alloc instead
   of 1
3: Only reserve chunk if balance data chunk.
All suggested-by: Filipe Manana 

Signed-off-by: Zhao Lei 
---
 fs/btrfs/volumes.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 6fc73586..cd9e5bd 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3277,6 +3277,7 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
u64 limit_data = bctl->data.limit;
u64 limit_meta = bctl->meta.limit;
u64 limit_sys = bctl->sys.limit;
+   int chunk_reserved = 0;
 
/* step one make some room on all the devices */
devices = _info->fs_devices->devices;
@@ -3326,6 +3327,8 @@ again:
key.type = BTRFS_CHUNK_ITEM_KEY;
 
while (1) {
+   u64 chunk_type;
+
if ((!counting && atomic_read(_info->balance_pause_req)) ||
atomic_read(_info->balance_cancel_req)) {
ret = -ECANCELED;
@@ -3371,8 +3374,10 @@ again:
spin_unlock(_info->balance_lock);
}
 
+   chunk_type = btrfs_chunk_type(leaf, chunk);
ret = should_balance_chunk(chunk_root, leaf, chunk,
   found_key.offset);
+
btrfs_release_path(path);
if (!ret) {
mutex_unlock(_info->delete_unused_bgs_mutex);
@@ -3387,6 +3392,25 @@ again:
goto loop;
}
 
+   if ((chunk_type & BTRFS_BLOCK_GROUP_DATA) && !chunk_reserved) {
+   trans = btrfs_start_transaction(chunk_root, 0);
+   if (IS_ERR(trans)) {
+   mutex_unlock(_info->delete_unused_bgs_mutex);
+   ret = PTR_ERR(trans);
+   goto error;
+   }
+
+   ret = btrfs_force_chunk_alloc(trans, chunk_root,
+ BTRFS_BLOCK_GROUP_DATA);
+   if (ret < 0) {
+   mutex_unlock(_info->delete_unused_bgs_mutex);
+   goto error;
+   }
+
+   btrfs_end_transaction(trans, chunk_root);
+   chunk_reserved = 1;
+   }
+
ret = btrfs_relocate_chunk(chunk_root,
   found_key.offset);
mutex_unlock(_info->delete_unused_bgs_mutex);
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] btrfs: Fix lost-data-profile caused by auto removing bg

2015-09-30 Thread Zhao Lei
Reproduce:
 (In integration-4.3 branch)

 TEST_DEV=(/dev/vdg /dev/vdh)
 TEST_DIR=/mnt/tmp

 umount "$TEST_DEV" >/dev/null
 mkfs.btrfs -f -d raid1 "${TEST_DEV[@]}"

 mount -o nospace_cache "$TEST_DEV" "$TEST_DIR"
 umount "$TEST_DEV"

 mount -o nospace_cache "$TEST_DEV" "$TEST_DIR"
 btrfs filesystem usage $TEST_DIR

We can see the data chunk changed from raid1 to single:
 # btrfs filesystem usage $TEST_DIR
 Data,single: Size:8.00MiB, Used:0.00B
/dev/vdg8.00MiB
 #

Reason:
 When a empty filesystem mount with -o nospace_cache, the last
 data blockgroup will be auto-removed in umount.

 Then if we mount it again, there is no data chunk in the
 filesystem, so the only available data profile is 0x0, result
 is all new chunks are created as single type.

Fix:
 Don't auto-delete last blockgroup for a raid type.

Test:
 Test by above script, and confirmed the logic by debug output.

Changelog v1->v2:
1: Put code of checking block_group->list into
   semaphore of space_info->groups_sem.
Noticed-by: Filipe Manana 

Signed-off-by: Zhao Lei 
---
 fs/btrfs/extent-tree.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 79a5bd9..ed9426c 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10010,8 +10010,18 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info 
*fs_info)
block_group = list_first_entry(_info->unused_bgs,
   struct btrfs_block_group_cache,
   bg_list);
-   space_info = block_group->space_info;
list_del_init(_group->bg_list);
+
+   space_info = block_group->space_info;
+
+   down_read(_info->groups_sem);
+   if (block_group->list.next == block_group->list.prev) {
+   up_read(_info->groups_sem);
+   btrfs_put_block_group(block_group);
+   continue;
+   }
+   up_read(_info->groups_sem);
+
if (ret || btrfs_mixed_space_info(space_info)) {
btrfs_put_block_group(block_group);
continue;
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V5 12/13] Btrfs: prepare_pages: Retry adding a page to the page cache

2015-09-30 Thread Chandan Rajendra
When reading the page from the disk, we can race with Direct I/O which can get
the page lock (before prepare_uptodate_page() gets it) and can go ahead and
invalidate the page. Hence if the page is not found in the inode's address
space, retry the operation of getting a page.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/file.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 5715e29..76db77c 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1316,6 +1316,7 @@ static noinline int prepare_pages(struct inode *inode, 
struct page **pages,
int faili;
 
for (i = 0; i < num_pages; i++) {
+again:
pages[i] = find_or_create_page(inode->i_mapping, index + i,
   mask | __GFP_WRITE);
if (!pages[i]) {
@@ -1330,6 +1331,21 @@ static noinline int prepare_pages(struct inode *inode, 
struct page **pages,
if (i == num_pages - 1)
err = prepare_uptodate_page(pages[i],
pos + write_bytes, false);
+
+   /*
+* When reading the page from the disk, we can race
+* with direct i/o which can get the page lock (before
+* prepare_uptodate_page() gets it) and can go ahead
+* and invalidate the page. Hence if the page is found
+* to be not belonging to the inode's address space,
+* retry the operation of getting a page.
+*/
+   if (unlikely(pages[i]->mapping != inode->i_mapping)) {
+   unlock_page(pages[i]);
+   page_cache_release(pages[i]);
+   goto again;
+   }
+
if (err) {
page_cache_release(pages[i]);
faili = i - 1;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V5 07/13] Btrfs: Use (eb->start, seq) as search key for tree modification log

2015-09-30 Thread Chandan Rajendra
In subpagesize-blocksize a page can map multiple extent buffers and hence
using (page index, seq) as the search key is incorrect. For example, searching
through tree modification log tree can return an entry associated with the
first extent buffer mapped by the page (if such an entry exists), when we are
actually searching for entries associated with extent buffers that are mapped
at position 2 or more in the page.

Reviewed-by: Liu Bo 
Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ctree.c | 34 +-
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 5f745ea..719ed3c 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -311,7 +311,7 @@ struct tree_mod_root {
 
 struct tree_mod_elem {
struct rb_node node;
-   u64 index;  /* shifted logical */
+   u64 logical;
u64 seq;
enum mod_log_op op;
 
@@ -435,11 +435,11 @@ void btrfs_put_tree_mod_seq(struct btrfs_fs_info *fs_info,
 
 /*
  * key order of the log:
- *   index -> sequence
+ *   node/leaf start address -> sequence
  *
- * the index is the shifted logical of the *new* root node for root replace
- * operations, or the shifted logical of the affected block for all other
- * operations.
+ * The 'start address' is the logical address of the *new* root node
+ * for root replace operations, or the logical address of the affected
+ * block for all other operations.
  *
  * Note: must be called with write lock (tree_mod_log_write_lock).
  */
@@ -460,9 +460,9 @@ __tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct 
tree_mod_elem *tm)
while (*new) {
cur = container_of(*new, struct tree_mod_elem, node);
parent = *new;
-   if (cur->index < tm->index)
+   if (cur->logical < tm->logical)
new = &((*new)->rb_left);
-   else if (cur->index > tm->index)
+   else if (cur->logical > tm->logical)
new = &((*new)->rb_right);
else if (cur->seq < tm->seq)
new = &((*new)->rb_left);
@@ -523,7 +523,7 @@ alloc_tree_mod_elem(struct extent_buffer *eb, int slot,
if (!tm)
return NULL;
 
-   tm->index = eb->start >> PAGE_CACHE_SHIFT;
+   tm->logical = eb->start;
if (op != MOD_LOG_KEY_ADD) {
btrfs_node_key(eb, >key, slot);
tm->blockptr = btrfs_node_blockptr(eb, slot);
@@ -588,7 +588,7 @@ tree_mod_log_insert_move(struct btrfs_fs_info *fs_info,
goto free_tms;
}
 
-   tm->index = eb->start >> PAGE_CACHE_SHIFT;
+   tm->logical = eb->start;
tm->slot = src_slot;
tm->move.dst_slot = dst_slot;
tm->move.nr_items = nr_items;
@@ -699,7 +699,7 @@ tree_mod_log_insert_root(struct btrfs_fs_info *fs_info,
goto free_tms;
}
 
-   tm->index = new_root->start >> PAGE_CACHE_SHIFT;
+   tm->logical = new_root->start;
tm->old_root.logical = old_root->start;
tm->old_root.level = btrfs_header_level(old_root);
tm->generation = btrfs_header_generation(old_root);
@@ -739,16 +739,15 @@ __tree_mod_log_search(struct btrfs_fs_info *fs_info, u64 
start, u64 min_seq,
struct rb_node *node;
struct tree_mod_elem *cur = NULL;
struct tree_mod_elem *found = NULL;
-   u64 index = start >> PAGE_CACHE_SHIFT;
 
tree_mod_log_read_lock(fs_info);
tm_root = _info->tree_mod_log;
node = tm_root->rb_node;
while (node) {
cur = container_of(node, struct tree_mod_elem, node);
-   if (cur->index < index) {
+   if (cur->logical < start) {
node = node->rb_left;
-   } else if (cur->index > index) {
+   } else if (cur->logical > start) {
node = node->rb_right;
} else if (cur->seq < min_seq) {
node = node->rb_left;
@@ -1230,9 +1229,10 @@ __tree_mod_log_oldest_root(struct btrfs_fs_info *fs_info,
return NULL;
 
/*
-* the very last operation that's logged for a root is the replacement
-* operation (if it is replaced at all). this has the index of the *new*
-* root, making it the very first operation that's logged for this root.
+* the very last operation that's logged for a root is the
+* replacement operation (if it is replaced at all). this has
+* the logical address of the *new* root, making it the very
+* first operation that's logged for this root.
 */
while (1) {
tm = tree_mod_log_search_oldest(fs_info, root_logical,
@@ -1336,7 +1336,7 @@ __tree_mod_log_rewind(struct btrfs_fs_info *fs_info, 
struct extent_buffer *eb,
if (!next)
break;
 

[PATCH V5 01/13] Btrfs: __btrfs_buffered_write: Reserve/release extents aligned to block size

2015-09-30 Thread Chandan Rajendra
Currently, the code reserves/releases extents in multiples of PAGE_CACHE_SIZE
units. Fix this by doing reservation/releases in block size units.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/file.c | 44 +++-
 1 file changed, 31 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index b823fac..12ce401 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -499,7 +499,7 @@ int btrfs_dirty_pages(struct btrfs_root *root, struct inode 
*inode,
loff_t isize = i_size_read(inode);
 
start_pos = pos & ~((u64)root->sectorsize - 1);
-   num_bytes = ALIGN(write_bytes + pos - start_pos, root->sectorsize);
+   num_bytes = round_up(write_bytes + pos - start_pos, root->sectorsize);
 
end_of_last_block = start_pos + num_bytes - 1;
err = btrfs_set_extent_delalloc(inode, start_pos, end_of_last_block,
@@ -1362,16 +1362,19 @@ fail:
 static noinline int
 lock_and_cleanup_extent_if_need(struct inode *inode, struct page **pages,
size_t num_pages, loff_t pos,
+   size_t write_bytes,
u64 *lockstart, u64 *lockend,
struct extent_state **cached_state)
 {
+   struct btrfs_root *root = BTRFS_I(inode)->root;
u64 start_pos;
u64 last_pos;
int i;
int ret = 0;
 
-   start_pos = pos & ~((u64)PAGE_CACHE_SIZE - 1);
-   last_pos = start_pos + ((u64)num_pages << PAGE_CACHE_SHIFT) - 1;
+   start_pos = round_down(pos, root->sectorsize);
+   last_pos = start_pos
+   + round_up(pos + write_bytes - start_pos, root->sectorsize) - 1;
 
if (start_pos < inode->i_size) {
struct btrfs_ordered_extent *ordered;
@@ -1489,6 +1492,7 @@ static noinline ssize_t __btrfs_buffered_write(struct 
file *file,
 
while (iov_iter_count(i) > 0) {
size_t offset = pos & (PAGE_CACHE_SIZE - 1);
+   size_t sector_offset;
size_t write_bytes = min(iov_iter_count(i),
 nrptrs * (size_t)PAGE_CACHE_SIZE -
 offset);
@@ -1497,6 +1501,8 @@ static noinline ssize_t __btrfs_buffered_write(struct 
file *file,
size_t reserve_bytes;
size_t dirty_pages;
size_t copied;
+   size_t dirty_sectors;
+   size_t num_sectors;
 
WARN_ON(num_pages > nrptrs);
 
@@ -1509,8 +1515,12 @@ static noinline ssize_t __btrfs_buffered_write(struct 
file *file,
break;
}
 
-   reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
+   sector_offset = pos & (root->sectorsize - 1);
+   reserve_bytes = round_up(write_bytes + sector_offset,
+   root->sectorsize);
+
ret = btrfs_check_data_free_space(inode, reserve_bytes, 
write_bytes);
+
if (ret == -ENOSPC &&
(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
  BTRFS_INODE_PREALLOC))) {
@@ -1523,7 +1533,10 @@ static noinline ssize_t __btrfs_buffered_write(struct 
file *file,
 */
num_pages = DIV_ROUND_UP(write_bytes + offset,
 PAGE_CACHE_SIZE);
-   reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
+   reserve_bytes = round_up(write_bytes
+   + sector_offset,
+   root->sectorsize);
+
ret = 0;
} else {
ret = -ENOSPC;
@@ -1558,8 +1571,8 @@ again:
break;
 
ret = lock_and_cleanup_extent_if_need(inode, pages, num_pages,
- pos, , ,
- _state);
+   pos, write_bytes, ,
+   , _state);
if (ret < 0) {
if (ret == -EAGAIN)
goto again;
@@ -1595,9 +1608,14 @@ again:
 * we still have an outstanding extent for the chunk we actually
 * managed to copy.
 */
-   if (num_pages > dirty_pages) {
-   release_bytes = (num_pages - dirty_pages) <<
-   PAGE_CACHE_SHIFT;
+   num_sectors = reserve_bytes >> inode->i_blkbits;
+   dirty_sectors = round_up(copied + sector_offset,
+   root->sectorsize);
+   dirty_sectors >>= 

[PATCH V5 05/13] Btrfs: btrfs_page_mkwrite: Reserve space in sectorsized units

2015-09-30 Thread Chandan Rajendra
In subpagesize-blocksize scenario, if i_size occurs in a block which is not
the last block in the page, then the space to be reserved should be calculated
appropriately.

Reviewed-by: Liu Bo 
Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/inode.c | 36 +++-
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5301d4e..5e6052d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8659,11 +8659,24 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, 
struct vm_fault *vmf)
loff_t size;
int ret;
int reserved = 0;
+   u64 reserved_space;
u64 page_start;
u64 page_end;
+   u64 end;
+
+   reserved_space = PAGE_CACHE_SIZE;
 
sb_start_pagefault(inode->i_sb);
-   ret  = btrfs_delalloc_reserve_space(inode, PAGE_CACHE_SIZE);
+
+   /*
+ Reserving delalloc space after obtaining the page lock can lead to
+ deadlock. For example, if a dirty page is locked by this function
+ and the call to btrfs_delalloc_reserve_space() ends up triggering
+ dirty page write out, then the btrfs_writepage() function could
+ end up waiting indefinitely to get a lock on the page currently
+ being processed by btrfs_page_mkwrite() function.
+*/
+   ret  = btrfs_delalloc_reserve_space(inode, reserved_space);
if (!ret) {
ret = file_update_time(vma->vm_file);
reserved = 1;
@@ -8684,6 +8697,7 @@ again:
size = i_size_read(inode);
page_start = page_offset(page);
page_end = page_start + PAGE_CACHE_SIZE - 1;
+   end = page_end;
 
if ((page->mapping != inode->i_mapping) ||
(page_start >= size)) {
@@ -8699,7 +8713,7 @@ again:
 * we can't set the delalloc bits if there are pending ordered
 * extents.  Drop our locks and wait for them to finish
 */
-   ordered = btrfs_lookup_ordered_extent(inode, page_start);
+   ordered = btrfs_lookup_ordered_range(inode, page_start, page_end);
if (ordered) {
unlock_extent_cached(io_tree, page_start, page_end,
 _state, GFP_NOFS);
@@ -8709,6 +8723,18 @@ again:
goto again;
}
 
+   if (page->index == ((size - 1) >> PAGE_CACHE_SHIFT)) {
+   reserved_space = round_up(size - page_start, root->sectorsize);
+   if (reserved_space < PAGE_CACHE_SIZE) {
+   end = page_start + reserved_space - 1;
+   spin_lock(_I(inode)->lock);
+   BTRFS_I(inode)->outstanding_extents++;
+   spin_unlock(_I(inode)->lock);
+   btrfs_delalloc_release_space(inode,
+   PAGE_CACHE_SIZE - 
reserved_space);
+   }
+   }
+
/*
 * XXX - page_mkwrite gets called every time the page is dirtied, even
 * if it was already dirty, so for space accounting reasons we need to
@@ -8716,12 +8742,12 @@ again:
 * is probably a better way to do this, but for now keep consistent with
 * prepare_pages in the normal write path.
 */
-   clear_extent_bit(_I(inode)->io_tree, page_start, page_end,
+   clear_extent_bit(_I(inode)->io_tree, page_start, end,
  EXTENT_DIRTY | EXTENT_DELALLOC |
  EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG,
  0, 0, _state, GFP_NOFS);
 
-   ret = btrfs_set_extent_delalloc(inode, page_start, page_end,
+   ret = btrfs_set_extent_delalloc(inode, page_start, end,
_state);
if (ret) {
unlock_extent_cached(io_tree, page_start, page_end,
@@ -8760,7 +8786,7 @@ out_unlock:
}
unlock_page(page);
 out:
-   btrfs_delalloc_release_space(inode, PAGE_CACHE_SIZE);
+   btrfs_delalloc_release_space(inode, reserved_space);
 out_noreserve:
sb_end_pagefault(inode->i_sb);
return ret;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V5 13/13] Btrfs: Return valid delalloc range when the page does not have PG_Dirty flag set or has been invalidated

2015-09-30 Thread Chandan Rajendra
The following issue was observed when running generic/095 test on
subpagesize-blocksize patchset.

Assume that we are trying to write a dirty page that is mapping file offset
range [159744, 163839].

writepage_delalloc()
  find_lock_delalloc_range(*start = 159744, *end = 0)
find_delalloc_range()
  Returns range [X, Y] where (X > 163839)
lock_delalloc_pages()
  One of the pages in range [X, Y] has dirty flag cleared;
  Loop once more restricting the delalloc range to span only
  PAGE_CACHE_SIZE bytes;
find_delalloc_range()
  Returns range [356352, 360447];
lock_delalloc_pages()
  The page [356352, 360447] has dirty flag cleared;
Returns with *start = 159744 and *end = 0;
  *start = *end + 1;
  find_lock_delalloc_range(*start = 1, *end = 0)
Finds and returns delalloc range [1, 12288];
  cow_file_range()
Clears delalloc range [1, 12288]
Create ordered extent for range [1, 12288]

The ordered extent thus created above breaks the rule that extents have to be
aligned to the filesystem's block size.

In cases where lock_delalloc_pages() fails (either due to PG_dirty flag being
cleared or the page no longer being a member of the inode's page cache), this
patch sets and returns the delalloc range that was found by
find_delalloc_range().

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/extent_io.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0ee486a..3912d1f 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1731,6 +1731,8 @@ again:
goto again;
} else {
found = 0;
+   *start = delalloc_start;
+   *end = delalloc_end;
goto out_failed;
}
}
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V5 09/13] Btrfs: Limit inline extents to root->sectorsize

2015-09-30 Thread Chandan Rajendra
cow_file_range_inline() limits the size of an inline extent to
PAGE_CACHE_SIZE. This breaks in subpagesize-blocksize scenarios. Fix this by
comparing against root->sectorsize.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b1ceba4..b2eedb9 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -257,7 +257,7 @@ static noinline int cow_file_range_inline(struct btrfs_root 
*root,
data_len = compressed_size;
 
if (start > 0 ||
-   actual_end > PAGE_CACHE_SIZE ||
+   actual_end > root->sectorsize ||
data_len > BTRFS_MAX_INLINE_DATA_SIZE(root) ||
(!compressed_size &&
(actual_end & (root->sectorsize - 1)) == 0) ||
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V5 08/13] Btrfs: btrfs_submit_direct_hook: Handle map_length < bio vector length

2015-09-30 Thread Chandan Rajendra
In subpagesize-blocksize scenario, map_length can be less than the length of a
bio vector. Such a condition may cause btrfs_submit_direct_hook() to submit a
zero length bio. Fix this by comparing map_length against block size rather
than with bv_len.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/inode.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4fbe9de..b1ceba4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8148,9 +8148,11 @@ static int btrfs_submit_direct_hook(int rw, struct 
btrfs_dio_private *dip,
u64 file_offset = dip->logical_offset;
u64 submit_len = 0;
u64 map_length;
-   int nr_pages = 0;
-   int ret;
+   u32 blocksize = root->sectorsize;
int async_submit = 0;
+   int nr_sectors;
+   int ret;
+   int i;
 
map_length = orig_bio->bi_iter.bi_size;
ret = btrfs_map_block(root->fs_info, rw, start_sector << 9,
@@ -8180,9 +8182,12 @@ static int btrfs_submit_direct_hook(int rw, struct 
btrfs_dio_private *dip,
atomic_inc(>pending_bios);
 
while (bvec <= (orig_bio->bi_io_vec + orig_bio->bi_vcnt - 1)) {
-   if (map_length < submit_len + bvec->bv_len ||
-   bio_add_page(bio, bvec->bv_page, bvec->bv_len,
-bvec->bv_offset) < bvec->bv_len) {
+   nr_sectors = bvec->bv_len >> inode->i_blkbits;
+   i = 0;
+next_block:
+   if (unlikely(map_length < submit_len + blocksize ||
+   bio_add_page(bio, bvec->bv_page, blocksize,
+   bvec->bv_offset + (i * blocksize)) < blocksize)) {
/*
 * inc the count before we submit the bio so
 * we know the end IO handler won't happen before
@@ -8203,7 +8208,6 @@ static int btrfs_submit_direct_hook(int rw, struct 
btrfs_dio_private *dip,
file_offset += submit_len;
 
submit_len = 0;
-   nr_pages = 0;
 
bio = btrfs_dio_bio_alloc(orig_bio->bi_bdev,
  start_sector, GFP_NOFS);
@@ -8221,9 +8225,14 @@ static int btrfs_submit_direct_hook(int rw, struct 
btrfs_dio_private *dip,
bio_put(bio);
goto out_err;
}
+
+   goto next_block;
} else {
-   submit_len += bvec->bv_len;
-   nr_pages++;
+   submit_len += blocksize;
+   if (--nr_sectors) {
+   i++;
+   goto next_block;
+   }
bvec++;
}
}
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V5 04/13] Btrfs: fallocate: Work with sectorsized blocks

2015-09-30 Thread Chandan Rajendra
While at it, this commit changes btrfs_truncate_page() to truncate sectorsized
blocks instead of pages. Hence the function has been renamed to
btrfs_truncate_block().

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ctree.h |  2 +-
 fs/btrfs/file.c  | 47 +--
 fs/btrfs/inode.c | 52 +++-
 3 files changed, 53 insertions(+), 48 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 938efe3..99a0fff 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3893,7 +3893,7 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct inode *dir, u64 objectid,
const char *name, int name_len);
-int btrfs_truncate_page(struct inode *inode, loff_t from, loff_t len,
+int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
int front);
 int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
   struct btrfs_root *root,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 12ce401..360d56d 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2280,23 +2280,26 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
u64 tail_len;
u64 orig_start = offset;
u64 cur_offset;
+   unsigned char blocksize_bits;
u64 min_size = btrfs_calc_trunc_metadata_size(root, 1);
u64 drop_end;
int ret = 0;
int err = 0;
int rsv_count;
-   bool same_page;
+   bool same_block;
bool no_holes = btrfs_fs_incompat(root->fs_info, NO_HOLES);
u64 ino_size;
-   bool truncated_page = false;
+   bool truncated_block = false;
bool updated_inode = false;
 
+   blocksize_bits = inode->i_blkbits;
+
ret = btrfs_wait_ordered_range(inode, offset, len);
if (ret)
return ret;
 
mutex_lock(>i_mutex);
-   ino_size = round_up(inode->i_size, PAGE_CACHE_SIZE);
+   ino_size = round_up(inode->i_size, root->sectorsize);
ret = find_first_non_hole(inode, , );
if (ret < 0)
goto out_only_mutex;
@@ -2309,31 +2312,30 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
lockstart = round_up(offset, BTRFS_I(inode)->root->sectorsize);
lockend = round_down(offset + len,
 BTRFS_I(inode)->root->sectorsize) - 1;
-   same_page = ((offset >> PAGE_CACHE_SHIFT) ==
-   ((offset + len - 1) >> PAGE_CACHE_SHIFT));
-
+   same_block = ((offset >> blocksize_bits)
+   == ((offset + len - 1) >> blocksize_bits));
/*
-* We needn't truncate any page which is beyond the end of the file
+* We needn't truncate any block which is beyond the end of the file
 * because we are sure there is no data there.
 */
/*
-* Only do this if we are in the same page and we aren't doing the
-* entire page.
+* Only do this if we are in the same block and we aren't doing the
+* entire block.
 */
-   if (same_page && len < PAGE_CACHE_SIZE) {
+   if (same_block && len < root->sectorsize) {
if (offset < ino_size) {
-   truncated_page = true;
-   ret = btrfs_truncate_page(inode, offset, len, 0);
+   truncated_block = true;
+   ret = btrfs_truncate_block(inode, offset, len, 0);
} else {
ret = 0;
}
goto out_only_mutex;
}
 
-   /* zero back part of the first page */
+   /* zero back part of the first block */
if (offset < ino_size) {
-   truncated_page = true;
-   ret = btrfs_truncate_page(inode, offset, 0, 0);
+   truncated_block = true;
+   ret = btrfs_truncate_block(inode, offset, 0, 0);
if (ret) {
mutex_unlock(>i_mutex);
return ret;
@@ -2368,9 +2370,10 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
if (!ret) {
/* zero the front end of the last page */
if (tail_start + tail_len < ino_size) {
-   truncated_page = true;
-   ret = btrfs_truncate_page(inode,
-   tail_start + tail_len, 0, 1);
+   truncated_block = true;
+   ret = btrfs_truncate_block(inode,
+   tail_start + tail_len,
+   0, 1);
if (ret)
   

[PATCH V5 02/13] Btrfs: Compute and look up csums based on sectorsized blocks

2015-09-30 Thread Chandan Rajendra
Checksums are applicable to sectorsize units. The current code uses
bio->bv_len units to compute and look up checksums. This works on machines
where sectorsize == PAGE_SIZE. This patch makes the checksum computation and
look up code to work with sectorsize units.

Reviewed-by: Liu Bo 
Reviewed-by: Josef Bacik 
Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/file-item.c | 93 +---
 1 file changed, 59 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 58ece65..818c859 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -172,6 +172,7 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
u64 item_start_offset = 0;
u64 item_last_offset = 0;
u64 disk_bytenr;
+   u64 page_bytes_left;
u32 diff;
int nblocks;
int bio_index = 0;
@@ -220,6 +221,8 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
disk_bytenr = (u64)bio->bi_iter.bi_sector << 9;
if (dio)
offset = logical_offset;
+
+   page_bytes_left = bvec->bv_len;
while (bio_index < bio->bi_vcnt) {
if (!dio)
offset = page_offset(bvec->bv_page) + bvec->bv_offset;
@@ -243,7 +246,7 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
if (BTRFS_I(inode)->root->root_key.objectid ==
BTRFS_DATA_RELOC_TREE_OBJECTID) {
set_extent_bits(io_tree, offset,
-   offset + bvec->bv_len - 1,
+   offset + root->sectorsize - 1,
EXTENT_NODATASUM, GFP_NOFS);
} else {

btrfs_info(BTRFS_I(inode)->root->fs_info,
@@ -281,11 +284,17 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root 
*root,
 found:
csum += count * csum_size;
nblocks -= count;
-   bio_index += count;
+
while (count--) {
-   disk_bytenr += bvec->bv_len;
-   offset += bvec->bv_len;
-   bvec++;
+   disk_bytenr += root->sectorsize;
+   offset += root->sectorsize;
+   page_bytes_left -= root->sectorsize;
+   if (!page_bytes_left) {
+   bio_index++;
+   bvec++;
+   page_bytes_left = bvec->bv_len;
+   }
+
}
}
btrfs_free_path(path);
@@ -432,6 +441,8 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct 
inode *inode,
struct bio_vec *bvec = bio->bi_io_vec;
int bio_index = 0;
int index;
+   int nr_sectors;
+   int i;
unsigned long total_bytes = 0;
unsigned long this_sum_bytes = 0;
u64 offset;
@@ -451,7 +462,7 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct 
inode *inode,
offset = page_offset(bvec->bv_page) + bvec->bv_offset;
 
ordered = btrfs_lookup_ordered_extent(inode, offset);
-   BUG_ON(!ordered); /* Logic error */
+   ASSERT(ordered); /* Logic error */
sums->bytenr = (u64)bio->bi_iter.bi_sector << 9;
index = 0;
 
@@ -459,41 +470,55 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct 
inode *inode,
if (!contig)
offset = page_offset(bvec->bv_page) + bvec->bv_offset;
 
-   if (offset >= ordered->file_offset + ordered->len ||
-   offset < ordered->file_offset) {
-   unsigned long bytes_left;
-   sums->len = this_sum_bytes;
-   this_sum_bytes = 0;
-   btrfs_add_ordered_sum(inode, ordered, sums);
-   btrfs_put_ordered_extent(ordered);
+   data = kmap_atomic(bvec->bv_page);
 
-   bytes_left = bio->bi_iter.bi_size - total_bytes;
+   nr_sectors = (bvec->bv_len + root->sectorsize - 1)
+   >> inode->i_blkbits;
+
+   for (i = 0; i < nr_sectors; i++) {
+   if (offset >= ordered->file_offset + ordered->len ||
+   offset < ordered->file_offset) {
+   unsigned long bytes_left;
+
+   kunmap_atomic(data);
+   sums->len = this_sum_bytes;
+   this_sum_bytes = 0;
+   btrfs_add_ordered_sum(inode, ordered, sums);
+   btrfs_put_ordered_extent(ordered);
+
+   bytes_left = 

[PATCH V5 03/13] Btrfs: Direct I/O read: Work on sectorsized blocks

2015-09-30 Thread Chandan Rajendra
The direct I/O read's endio and corresponding repair functions work on
page sized blocks. This commit adds the ability for direct I/O read to work on
subpagesized blocks.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/inode.c | 96 ++--
 1 file changed, 73 insertions(+), 23 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b7e439b..5a47731 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7664,9 +7664,9 @@ static int btrfs_check_dio_repairable(struct inode *inode,
 }
 
 static int dio_read_error(struct inode *inode, struct bio *failed_bio,
- struct page *page, u64 start, u64 end,
- int failed_mirror, bio_end_io_t *repair_endio,
- void *repair_arg)
+   struct page *page, unsigned int pgoff,
+   u64 start, u64 end, int failed_mirror,
+   bio_end_io_t *repair_endio, void *repair_arg)
 {
struct io_failure_record *failrec;
struct bio *bio;
@@ -7687,7 +7687,9 @@ static int dio_read_error(struct inode *inode, struct bio 
*failed_bio,
return -EIO;
}
 
-   if (failed_bio->bi_vcnt > 1)
+   if ((failed_bio->bi_vcnt > 1)
+   || (failed_bio->bi_io_vec->bv_len
+   > BTRFS_I(inode)->root->sectorsize))
read_mode = READ_SYNC | REQ_FAILFAST_DEV;
else
read_mode = READ_SYNC;
@@ -7695,7 +7697,7 @@ static int dio_read_error(struct inode *inode, struct bio 
*failed_bio,
isector = start - btrfs_io_bio(failed_bio)->logical;
isector >>= inode->i_sb->s_blocksize_bits;
bio = btrfs_create_repair_bio(inode, failed_bio, failrec, page,
- 0, isector, repair_endio, repair_arg);
+   pgoff, isector, repair_endio, repair_arg);
if (!bio) {
free_io_failure(inode, failrec);
return -EIO;
@@ -7725,12 +7727,17 @@ struct btrfs_retry_complete {
 static void btrfs_retry_endio_nocsum(struct bio *bio, int err)
 {
struct btrfs_retry_complete *done = bio->bi_private;
+   struct inode *inode;
struct bio_vec *bvec;
int i;
 
if (err)
goto end;
 
+   ASSERT(bio->bi_vcnt == 1);
+   inode = bio->bi_io_vec->bv_page->mapping->host;
+   ASSERT(bio->bi_io_vec->bv_len == BTRFS_I(inode)->root->sectorsize);
+
done->uptodate = 1;
bio_for_each_segment_all(bvec, bio, i)
clean_io_failure(done->inode, done->start, bvec->bv_page, 0);
@@ -7745,22 +7752,30 @@ static int __btrfs_correct_data_nocsum(struct inode 
*inode,
struct bio_vec *bvec;
struct btrfs_retry_complete done;
u64 start;
+   unsigned int pgoff;
+   u32 sectorsize;
+   int nr_sectors;
int i;
int ret;
 
+   sectorsize = BTRFS_I(inode)->root->sectorsize;
+
start = io_bio->logical;
done.inode = inode;
 
bio_for_each_segment_all(bvec, _bio->bio, i) {
-try_again:
+   nr_sectors = bvec->bv_len >> inode->i_blkbits;
+   pgoff = bvec->bv_offset;
+
+next_block_or_try_again:
done.uptodate = 0;
done.start = start;
init_completion();
 
-   ret = dio_read_error(inode, _bio->bio, bvec->bv_page, start,
-start + bvec->bv_len - 1,
-io_bio->mirror_num,
-btrfs_retry_endio_nocsum, );
+   ret = dio_read_error(inode, _bio->bio, bvec->bv_page,
+   pgoff, start, start + sectorsize - 1,
+   io_bio->mirror_num,
+   btrfs_retry_endio_nocsum, );
if (ret)
return ret;
 
@@ -7768,10 +7783,15 @@ try_again:
 
if (!done.uptodate) {
/* We might have another mirror, so try again */
-   goto try_again;
+   goto next_block_or_try_again;
}
 
-   start += bvec->bv_len;
+   start += sectorsize;
+
+   if (nr_sectors--) {
+   pgoff += sectorsize;
+   goto next_block_or_try_again;
+   }
}
 
return 0;
@@ -7781,7 +7801,9 @@ static void btrfs_retry_endio(struct bio *bio, int err)
 {
struct btrfs_retry_complete *done = bio->bi_private;
struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
+   struct inode *inode;
struct bio_vec *bvec;
+   u64 start;
int uptodate;
int ret;
int i;
@@ -7790,13 +7812,20 @@ static void btrfs_retry_endio(struct bio *bio, int err)
goto end;
 
uptodate = 1;
+
+   start = done->start;
+
+

[PATCH V5 06/13] Btrfs: Search for all ordered extents that could span across a page

2015-09-30 Thread Chandan Rajendra
In subpagesize-blocksize scenario it is not sufficient to search using the
first byte of the page to make sure that there are no ordered extents
present across the page. Fix this.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/extent_io.c |  3 ++-
 fs/btrfs/inode.c | 25 ++---
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 11aa8f7..0ee486a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3224,7 +3224,8 @@ static int __extent_read_full_page(struct extent_io_tree 
*tree,
 
while (1) {
lock_extent(tree, start, end);
-   ordered = btrfs_lookup_ordered_extent(inode, start);
+   ordered = btrfs_lookup_ordered_range(inode, start,
+   PAGE_CACHE_SIZE);
if (!ordered)
break;
unlock_extent(tree, start, end);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5e6052d..4fbe9de 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1975,7 +1975,8 @@ again:
if (PagePrivate2(page))
goto out;
 
-   ordered = btrfs_lookup_ordered_extent(inode, page_start);
+   ordered = btrfs_lookup_ordered_range(inode, page_start,
+   PAGE_CACHE_SIZE);
if (ordered) {
unlock_extent_cached(_I(inode)->io_tree, page_start,
 page_end, _state, GFP_NOFS);
@@ -8552,6 +8553,8 @@ static void btrfs_invalidatepage(struct page *page, 
unsigned int offset,
struct extent_state *cached_state = NULL;
u64 page_start = page_offset(page);
u64 page_end = page_start + PAGE_CACHE_SIZE - 1;
+   u64 start;
+   u64 end;
int inode_evicting = inode->i_state & I_FREEING;
 
/*
@@ -8571,14 +8574,18 @@ static void btrfs_invalidatepage(struct page *page, 
unsigned int offset,
 
if (!inode_evicting)
lock_extent_bits(tree, page_start, page_end, 0, _state);
-   ordered = btrfs_lookup_ordered_extent(inode, page_start);
+again:
+   start = page_start;
+   ordered = btrfs_lookup_ordered_range(inode, start,
+   page_end - start + 1);
if (ordered) {
+   end = min(page_end, ordered->file_offset + ordered->len - 1);
/*
 * IO on this page will never be started, so we need
 * to account for any ordered extents now
 */
if (!inode_evicting)
-   clear_extent_bit(tree, page_start, page_end,
+   clear_extent_bit(tree, start, end,
 EXTENT_DIRTY | EXTENT_DELALLOC |
 EXTENT_LOCKED | EXTENT_DO_ACCOUNTING |
 EXTENT_DEFRAG, 1, 0, _state,
@@ -8595,22 +8602,26 @@ static void btrfs_invalidatepage(struct page *page, 
unsigned int offset,
 
spin_lock_irq(>lock);
set_bit(BTRFS_ORDERED_TRUNCATED, >flags);
-   new_len = page_start - ordered->file_offset;
+   new_len = start - ordered->file_offset;
if (new_len < ordered->truncated_len)
ordered->truncated_len = new_len;
spin_unlock_irq(>lock);
 
if (btrfs_dec_test_ordered_pending(inode, ,
-  page_start,
-  PAGE_CACHE_SIZE, 1))
+  start,
+  end - start + 1, 1))
btrfs_finish_ordered_io(ordered);
}
btrfs_put_ordered_extent(ordered);
if (!inode_evicting) {
cached_state = NULL;
-   lock_extent_bits(tree, page_start, page_end, 0,
+   lock_extent_bits(tree, start, end, 0,
 _state);
}
+
+   start = end + 1;
+   if (start < page_end)
+   goto again;
}
 
if (!inode_evicting) {
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V5 11/13] Btrfs: Clean pte corresponding to page straddling i_size

2015-09-30 Thread Chandan Rajendra
When extending a file by either "truncate up" or by writing beyond i_size, the
page which had i_size needs to be marked "read only" so that future writes to
the page via mmap interface causes btrfs_page_mkwrite() to be invoked. If not,
a write performed after extending the file via the mmap interface will find
the page to be writaeable and continue writing to the page without invoking
btrfs_page_mkwrite() i.e. we end up writing to a file without reserving disk
space.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/file.c  | 12 ++--
 fs/btrfs/inode.c |  2 +-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 360d56d..5715e29 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1757,6 +1757,8 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
ssize_t err;
loff_t pos;
size_t count;
+   loff_t oldsize;
+   int clean_page = 0;
 
mutex_lock(>i_mutex);
err = generic_write_checks(iocb, from);
@@ -1795,14 +1797,17 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
pos = iocb->ki_pos;
count = iov_iter_count(from);
start_pos = round_down(pos, root->sectorsize);
-   if (start_pos > i_size_read(inode)) {
+   oldsize = i_size_read(inode);
+   if (start_pos > oldsize) {
/* Expand hole size to cover write data, preventing empty gap */
end_pos = round_up(pos + count, root->sectorsize);
-   err = btrfs_cont_expand(inode, i_size_read(inode), end_pos);
+   err = btrfs_cont_expand(inode, oldsize, end_pos);
if (err) {
mutex_unlock(>i_mutex);
goto out;
}
+   if (start_pos > round_up(oldsize, root->sectorsize))
+   clean_page = 1;
}
 
if (sync)
@@ -1814,6 +1819,9 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
num_written = __btrfs_buffered_write(file, from, pos);
if (num_written > 0)
iocb->ki_pos = pos + num_written;
+   if (clean_page)
+   pagecache_isize_extended(inode, oldsize,
+   i_size_read(inode));
}
 
mutex_unlock(>i_mutex);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index c937357..f31da87 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4853,7 +4853,6 @@ static int btrfs_setsize(struct inode *inode, struct 
iattr *attr)
}
 
if (newsize > oldsize) {
-   truncate_pagecache(inode, newsize);
/*
 * Don't do an expanding truncate while snapshoting is ongoing.
 * This is to ensure the snapshot captures a fully consistent
@@ -4876,6 +4875,7 @@ static int btrfs_setsize(struct inode *inode, struct 
iattr *attr)
 
i_size_write(inode, newsize);
btrfs_ordered_update_i_size(inode, i_size_read(inode), NULL);
+   pagecache_isize_extended(inode, oldsize, newsize);
ret = btrfs_update_inode(trans, root, inode);
btrfs_end_write_no_snapshoting(root);
btrfs_end_transaction(trans, root);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH V4 09/13] Btrfs: Limit inline extents to root->sectorsize

2015-09-30 Thread Chandan Rajendra
cow_file_range_inline() limits the size of an inline extent to
PAGE_CACHE_SIZE. This breaks in subpagesize-blocksize scenarios. Fix this by
comparing against root->sectorsize.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b1ceba4..b2eedb9 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -257,7 +257,7 @@ static noinline int cow_file_range_inline(struct btrfs_root 
*root,
data_len = compressed_size;
 
if (start > 0 ||
-   actual_end > PAGE_CACHE_SIZE ||
+   actual_end > root->sectorsize ||
data_len > BTRFS_MAX_INLINE_DATA_SIZE(root) ||
(!compressed_size &&
(actual_end & (root->sectorsize - 1)) == 0) ||
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH V4 03/13] Btrfs: Direct I/O read: Work on sectorsized blocks

2015-09-30 Thread Chandan Rajendra
The direct I/O read's endio and corresponding repair functions work on
page sized blocks. This commit adds the ability for direct I/O read to work on
subpagesized blocks.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/inode.c | 96 ++--
 1 file changed, 73 insertions(+), 23 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b7e439b..5a47731 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7664,9 +7664,9 @@ static int btrfs_check_dio_repairable(struct inode *inode,
 }
 
 static int dio_read_error(struct inode *inode, struct bio *failed_bio,
- struct page *page, u64 start, u64 end,
- int failed_mirror, bio_end_io_t *repair_endio,
- void *repair_arg)
+   struct page *page, unsigned int pgoff,
+   u64 start, u64 end, int failed_mirror,
+   bio_end_io_t *repair_endio, void *repair_arg)
 {
struct io_failure_record *failrec;
struct bio *bio;
@@ -7687,7 +7687,9 @@ static int dio_read_error(struct inode *inode, struct bio 
*failed_bio,
return -EIO;
}
 
-   if (failed_bio->bi_vcnt > 1)
+   if ((failed_bio->bi_vcnt > 1)
+   || (failed_bio->bi_io_vec->bv_len
+   > BTRFS_I(inode)->root->sectorsize))
read_mode = READ_SYNC | REQ_FAILFAST_DEV;
else
read_mode = READ_SYNC;
@@ -7695,7 +7697,7 @@ static int dio_read_error(struct inode *inode, struct bio 
*failed_bio,
isector = start - btrfs_io_bio(failed_bio)->logical;
isector >>= inode->i_sb->s_blocksize_bits;
bio = btrfs_create_repair_bio(inode, failed_bio, failrec, page,
- 0, isector, repair_endio, repair_arg);
+   pgoff, isector, repair_endio, repair_arg);
if (!bio) {
free_io_failure(inode, failrec);
return -EIO;
@@ -7725,12 +7727,17 @@ struct btrfs_retry_complete {
 static void btrfs_retry_endio_nocsum(struct bio *bio, int err)
 {
struct btrfs_retry_complete *done = bio->bi_private;
+   struct inode *inode;
struct bio_vec *bvec;
int i;
 
if (err)
goto end;
 
+   ASSERT(bio->bi_vcnt == 1);
+   inode = bio->bi_io_vec->bv_page->mapping->host;
+   ASSERT(bio->bi_io_vec->bv_len == BTRFS_I(inode)->root->sectorsize);
+
done->uptodate = 1;
bio_for_each_segment_all(bvec, bio, i)
clean_io_failure(done->inode, done->start, bvec->bv_page, 0);
@@ -7745,22 +7752,30 @@ static int __btrfs_correct_data_nocsum(struct inode 
*inode,
struct bio_vec *bvec;
struct btrfs_retry_complete done;
u64 start;
+   unsigned int pgoff;
+   u32 sectorsize;
+   int nr_sectors;
int i;
int ret;
 
+   sectorsize = BTRFS_I(inode)->root->sectorsize;
+
start = io_bio->logical;
done.inode = inode;
 
bio_for_each_segment_all(bvec, _bio->bio, i) {
-try_again:
+   nr_sectors = bvec->bv_len >> inode->i_blkbits;
+   pgoff = bvec->bv_offset;
+
+next_block_or_try_again:
done.uptodate = 0;
done.start = start;
init_completion();
 
-   ret = dio_read_error(inode, _bio->bio, bvec->bv_page, start,
-start + bvec->bv_len - 1,
-io_bio->mirror_num,
-btrfs_retry_endio_nocsum, );
+   ret = dio_read_error(inode, _bio->bio, bvec->bv_page,
+   pgoff, start, start + sectorsize - 1,
+   io_bio->mirror_num,
+   btrfs_retry_endio_nocsum, );
if (ret)
return ret;
 
@@ -7768,10 +7783,15 @@ try_again:
 
if (!done.uptodate) {
/* We might have another mirror, so try again */
-   goto try_again;
+   goto next_block_or_try_again;
}
 
-   start += bvec->bv_len;
+   start += sectorsize;
+
+   if (nr_sectors--) {
+   pgoff += sectorsize;
+   goto next_block_or_try_again;
+   }
}
 
return 0;
@@ -7781,7 +7801,9 @@ static void btrfs_retry_endio(struct bio *bio, int err)
 {
struct btrfs_retry_complete *done = bio->bi_private;
struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
+   struct inode *inode;
struct bio_vec *bvec;
+   u64 start;
int uptodate;
int ret;
int i;
@@ -7790,13 +7812,20 @@ static void btrfs_retry_endio(struct bio *bio, int err)
goto end;
 
uptodate = 1;
+
+   start = done->start;
+
+

[RFC PATCH V4 05/13] Btrfs: btrfs_page_mkwrite: Reserve space in sectorsized units

2015-09-30 Thread Chandan Rajendra
In subpagesize-blocksize scenario, if i_size occurs in a block which is not
the last block in the page, then the space to be reserved should be calculated
appropriately.

Reviewed-by: Liu Bo 
Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/inode.c | 36 +++-
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5301d4e..5e6052d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8659,11 +8659,24 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, 
struct vm_fault *vmf)
loff_t size;
int ret;
int reserved = 0;
+   u64 reserved_space;
u64 page_start;
u64 page_end;
+   u64 end;
+
+   reserved_space = PAGE_CACHE_SIZE;
 
sb_start_pagefault(inode->i_sb);
-   ret  = btrfs_delalloc_reserve_space(inode, PAGE_CACHE_SIZE);
+
+   /*
+ Reserving delalloc space after obtaining the page lock can lead to
+ deadlock. For example, if a dirty page is locked by this function
+ and the call to btrfs_delalloc_reserve_space() ends up triggering
+ dirty page write out, then the btrfs_writepage() function could
+ end up waiting indefinitely to get a lock on the page currently
+ being processed by btrfs_page_mkwrite() function.
+*/
+   ret  = btrfs_delalloc_reserve_space(inode, reserved_space);
if (!ret) {
ret = file_update_time(vma->vm_file);
reserved = 1;
@@ -8684,6 +8697,7 @@ again:
size = i_size_read(inode);
page_start = page_offset(page);
page_end = page_start + PAGE_CACHE_SIZE - 1;
+   end = page_end;
 
if ((page->mapping != inode->i_mapping) ||
(page_start >= size)) {
@@ -8699,7 +8713,7 @@ again:
 * we can't set the delalloc bits if there are pending ordered
 * extents.  Drop our locks and wait for them to finish
 */
-   ordered = btrfs_lookup_ordered_extent(inode, page_start);
+   ordered = btrfs_lookup_ordered_range(inode, page_start, page_end);
if (ordered) {
unlock_extent_cached(io_tree, page_start, page_end,
 _state, GFP_NOFS);
@@ -8709,6 +8723,18 @@ again:
goto again;
}
 
+   if (page->index == ((size - 1) >> PAGE_CACHE_SHIFT)) {
+   reserved_space = round_up(size - page_start, root->sectorsize);
+   if (reserved_space < PAGE_CACHE_SIZE) {
+   end = page_start + reserved_space - 1;
+   spin_lock(_I(inode)->lock);
+   BTRFS_I(inode)->outstanding_extents++;
+   spin_unlock(_I(inode)->lock);
+   btrfs_delalloc_release_space(inode,
+   PAGE_CACHE_SIZE - 
reserved_space);
+   }
+   }
+
/*
 * XXX - page_mkwrite gets called every time the page is dirtied, even
 * if it was already dirty, so for space accounting reasons we need to
@@ -8716,12 +8742,12 @@ again:
 * is probably a better way to do this, but for now keep consistent with
 * prepare_pages in the normal write path.
 */
-   clear_extent_bit(_I(inode)->io_tree, page_start, page_end,
+   clear_extent_bit(_I(inode)->io_tree, page_start, end,
  EXTENT_DIRTY | EXTENT_DELALLOC |
  EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG,
  0, 0, _state, GFP_NOFS);
 
-   ret = btrfs_set_extent_delalloc(inode, page_start, page_end,
+   ret = btrfs_set_extent_delalloc(inode, page_start, end,
_state);
if (ret) {
unlock_extent_cached(io_tree, page_start, page_end,
@@ -8760,7 +8786,7 @@ out_unlock:
}
unlock_page(page);
 out:
-   btrfs_delalloc_release_space(inode, PAGE_CACHE_SIZE);
+   btrfs_delalloc_release_space(inode, reserved_space);
 out_noreserve:
sb_end_pagefault(inode->i_sb);
return ret;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] btrfs: Fix lost-data-profile caused by balance bg

2015-09-30 Thread Filipe Manana
On Wed, Sep 30, 2015 at 9:26 AM, Zhao Lei  wrote:
> Hi, Filipe Manana
>
>> -Original Message-
>> From: Filipe Manana [mailto:fdman...@gmail.com]
>> Sent: Wednesday, September 30, 2015 3:41 PM
>> To: Zhao Lei 
>> Cc: linux-btrfs@vger.kernel.org
>> Subject: Re: [PATCH 2/2] btrfs: Fix lost-data-profile caused by balance bg
>>
>> On Wed, Sep 30, 2015 at 5:20 AM, Zhao Lei  wrote:
>> > Hi, Filipe Manana
>> >
>> > Thanks for reviewing.
>> >
>> >> -Original Message-
>> >> From: Filipe Manana [mailto:fdman...@gmail.com]
>> >> Sent: Tuesday, September 29, 2015 11:48 PM
>> >> To: Zhao Lei 
>> >> Cc: linux-btrfs@vger.kernel.org
>> >> Subject: Re: [PATCH 2/2] btrfs: Fix lost-data-profile caused by
>> >> balance bg
>> >>
>> >> On Tue, Sep 29, 2015 at 2:51 PM, Zhao Lei  wrote:
>> >> > Reproduce:
>> >> >  (In integration-4.3 branch)
>> >> >
>> >> >  TEST_DEV=(/dev/vdg /dev/vdh)
>> >> >  TEST_DIR=/mnt/tmp
>> >> >
>> >> >  umount "$TEST_DEV" >/dev/null
>> >> >  mkfs.btrfs -f -d raid1 "${TEST_DEV[@]}"
>> >> >
>> >> >  mount -o nospace_cache "$TEST_DEV" "$TEST_DIR"
>> >> >  btrfs balance start -dusage=0 $TEST_DIR  btrfs filesystem usage
>> >> > $TEST_DIR
>> >> >
>> >> >  dd if=/dev/zero of="$TEST_DIR"/file count=100  btrfs filesystem
>> >> > usage $TEST_DIR
>> >> >
>> >> > Result:
>> >> >  We can see "no data chunk" in first "btrfs filesystem usage":
>> >> >  # btrfs filesystem usage $TEST_DIR
>> >> >  Overall:
>> >> > ...
>> >> >  Metadata,single: Size:8.00MiB, Used:0.00B
>> >> > /dev/vdg8.00MiB
>> >> >  Metadata,RAID1: Size:122.88MiB, Used:112.00KiB
>> >> > /dev/vdg  122.88MiB
>> >> > /dev/vdh  122.88MiB
>> >> >  System,single: Size:4.00MiB, Used:0.00B
>> >> > /dev/vdg4.00MiB
>> >> >  System,RAID1: Size:8.00MiB, Used:16.00KiB
>> >> > /dev/vdg8.00MiB
>> >> > /dev/vdh8.00MiB
>> >> >  Unallocated:
>> >> > /dev/vdg1.06GiB
>> >> > /dev/vdh1.07GiB
>> >> >
>> >> >  And "data chunks changed from raid1 to single" in second  "btrfs
>> >> > filesystem usage":
>> >> >  # btrfs filesystem usage $TEST_DIR
>> >> >  Overall:
>> >> > ...
>> >> >  Data,single: Size:256.00MiB, Used:0.00B
>> >> > /dev/vdh  256.00MiB
>> >> >  Metadata,single: Size:8.00MiB, Used:0.00B
>> >> > /dev/vdg8.00MiB
>> >> >  Metadata,RAID1: Size:122.88MiB, Used:112.00KiB
>> >> > /dev/vdg  122.88MiB
>> >> > /dev/vdh  122.88MiB
>> >> >  System,single: Size:4.00MiB, Used:0.00B
>> >> > /dev/vdg4.00MiB
>> >> >  System,RAID1: Size:8.00MiB, Used:16.00KiB
>> >> > /dev/vdg8.00MiB
>> >> > /dev/vdh8.00MiB
>> >> >  Unallocated:
>> >> > /dev/vdg1.06GiB
>> >> > /dev/vdh  841.92MiB
>> >> >
>> >> > Reason:
>> >> >  btrfs balance delete last data chunk in case of no data in  the
>> >> > filesystem, then we can see "no data chunk" by "fi usage"
>> >> >  command.
>> >> >
>> >> >  And when we do write operation to fs, the only available data
>> >> > profile is 0x0, result is all new chunks are allocated single type.
>> >> >
>> >> > Fix:
>> >> >  Allocate a data chunk explicitly in balance operation, to reserve
>> >> > at least one data chunk in the filesystem.
>> >>
>> >> Allocate a data chunk explicitly to ensure we don't lose the raid profile 
>> >> for
>> data.
>> >>
>> >
>> > Thanks, will change in v2.
>> >
>> >> >
>> >> > Test:
>> >> >  Test by above script, and confirmed the logic by debug output.
>> >> >
>> >> > Signed-off-by: Zhao Lei 
>> >> > ---
>> >> >  fs/btrfs/volumes.c | 19 +++
>> >> >  1 file changed, 19 insertions(+)
>> >> >
>> >> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index
>> >> > 6fc73586..3d5e41e 100644
>> >> > --- a/fs/btrfs/volumes.c
>> >> > +++ b/fs/btrfs/volumes.c
>> >> > @@ -3277,6 +3277,7 @@ static int __btrfs_balance(struct
>> >> > btrfs_fs_info
>> >> *fs_info)
>> >> > u64 limit_data = bctl->data.limit;
>> >> > u64 limit_meta = bctl->meta.limit;
>> >> > u64 limit_sys = bctl->sys.limit;
>> >> > +   int chunk_reserved = 0;
>> >> >
>> >> > /* step one make some room on all the devices */
>> >> > devices = _info->fs_devices->devices; @@ -3387,6
>> >> > +3388,24 @@ again:
>> >> > goto loop;
>> >> > }
>> >> >
>> >> > +   if (!chunk_reserved) {
>> >> > +   trans = btrfs_start_transaction(chunk_root,
>> 0);
>> >> > +   if (IS_ERR(trans)) {
>> >> > +
>> >> mutex_unlock(_info->delete_unused_bgs_mutex);
>> >> > +   ret = PTR_ERR(trans);
>> >> > +   goto error;
>> >> > +   }
>> >> > +
>> >> > +   ret = btrfs_force_chunk_alloc(trans,
>> >> > + chunk_root, 1);
>> >>
>> >> Can 

[RFC PATCH V4 10/13] Btrfs: Fix block size returned to user space

2015-09-30 Thread Chandan Rajendra
btrfs_getattr() returns PAGE_CACHE_SIZE as the block size. Since
generic_fillattr() already does the right thing (by obtaining block size
from inode->i_blkbits), just remove the statement from btrfs_getattr.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/inode.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b2eedb9..c937357 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9197,7 +9197,6 @@ static int btrfs_getattr(struct vfsmount *mnt,
 
generic_fillattr(inode, stat);
stat->dev = BTRFS_I(inode)->root->anon_dev;
-   stat->blksize = PAGE_CACHE_SIZE;
 
spin_lock(_I(inode)->lock);
delalloc_bytes = BTRFS_I(inode)->delalloc_bytes;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH V4 08/13] Btrfs: btrfs_submit_direct_hook: Handle map_length < bio vector length

2015-09-30 Thread Chandan Rajendra
In subpagesize-blocksize scenario, map_length can be less than the length of a
bio vector. Such a condition may cause btrfs_submit_direct_hook() to submit a
zero length bio. Fix this by comparing map_length against block size rather
than with bv_len.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/inode.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4fbe9de..b1ceba4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8148,9 +8148,11 @@ static int btrfs_submit_direct_hook(int rw, struct 
btrfs_dio_private *dip,
u64 file_offset = dip->logical_offset;
u64 submit_len = 0;
u64 map_length;
-   int nr_pages = 0;
-   int ret;
+   u32 blocksize = root->sectorsize;
int async_submit = 0;
+   int nr_sectors;
+   int ret;
+   int i;
 
map_length = orig_bio->bi_iter.bi_size;
ret = btrfs_map_block(root->fs_info, rw, start_sector << 9,
@@ -8180,9 +8182,12 @@ static int btrfs_submit_direct_hook(int rw, struct 
btrfs_dio_private *dip,
atomic_inc(>pending_bios);
 
while (bvec <= (orig_bio->bi_io_vec + orig_bio->bi_vcnt - 1)) {
-   if (map_length < submit_len + bvec->bv_len ||
-   bio_add_page(bio, bvec->bv_page, bvec->bv_len,
-bvec->bv_offset) < bvec->bv_len) {
+   nr_sectors = bvec->bv_len >> inode->i_blkbits;
+   i = 0;
+next_block:
+   if (unlikely(map_length < submit_len + blocksize ||
+   bio_add_page(bio, bvec->bv_page, blocksize,
+   bvec->bv_offset + (i * blocksize)) < blocksize)) {
/*
 * inc the count before we submit the bio so
 * we know the end IO handler won't happen before
@@ -8203,7 +8208,6 @@ static int btrfs_submit_direct_hook(int rw, struct 
btrfs_dio_private *dip,
file_offset += submit_len;
 
submit_len = 0;
-   nr_pages = 0;
 
bio = btrfs_dio_bio_alloc(orig_bio->bi_bdev,
  start_sector, GFP_NOFS);
@@ -8221,9 +8225,14 @@ static int btrfs_submit_direct_hook(int rw, struct 
btrfs_dio_private *dip,
bio_put(bio);
goto out_err;
}
+
+   goto next_block;
} else {
-   submit_len += bvec->bv_len;
-   nr_pages++;
+   submit_len += blocksize;
+   if (--nr_sectors) {
+   i++;
+   goto next_block;
+   }
bvec++;
}
}
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH V4 02/13] Btrfs: Compute and look up csums based on sectorsized blocks

2015-09-30 Thread Chandan Rajendra
Checksums are applicable to sectorsize units. The current code uses
bio->bv_len units to compute and look up checksums. This works on machines
where sectorsize == PAGE_SIZE. This patch makes the checksum computation and
look up code to work with sectorsize units.

Reviewed-by: Liu Bo 
Reviewed-by: Josef Bacik 
Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/file-item.c | 93 +---
 1 file changed, 59 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 58ece65..818c859 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -172,6 +172,7 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
u64 item_start_offset = 0;
u64 item_last_offset = 0;
u64 disk_bytenr;
+   u64 page_bytes_left;
u32 diff;
int nblocks;
int bio_index = 0;
@@ -220,6 +221,8 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
disk_bytenr = (u64)bio->bi_iter.bi_sector << 9;
if (dio)
offset = logical_offset;
+
+   page_bytes_left = bvec->bv_len;
while (bio_index < bio->bi_vcnt) {
if (!dio)
offset = page_offset(bvec->bv_page) + bvec->bv_offset;
@@ -243,7 +246,7 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
if (BTRFS_I(inode)->root->root_key.objectid ==
BTRFS_DATA_RELOC_TREE_OBJECTID) {
set_extent_bits(io_tree, offset,
-   offset + bvec->bv_len - 1,
+   offset + root->sectorsize - 1,
EXTENT_NODATASUM, GFP_NOFS);
} else {

btrfs_info(BTRFS_I(inode)->root->fs_info,
@@ -281,11 +284,17 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root 
*root,
 found:
csum += count * csum_size;
nblocks -= count;
-   bio_index += count;
+
while (count--) {
-   disk_bytenr += bvec->bv_len;
-   offset += bvec->bv_len;
-   bvec++;
+   disk_bytenr += root->sectorsize;
+   offset += root->sectorsize;
+   page_bytes_left -= root->sectorsize;
+   if (!page_bytes_left) {
+   bio_index++;
+   bvec++;
+   page_bytes_left = bvec->bv_len;
+   }
+
}
}
btrfs_free_path(path);
@@ -432,6 +441,8 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct 
inode *inode,
struct bio_vec *bvec = bio->bi_io_vec;
int bio_index = 0;
int index;
+   int nr_sectors;
+   int i;
unsigned long total_bytes = 0;
unsigned long this_sum_bytes = 0;
u64 offset;
@@ -451,7 +462,7 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct 
inode *inode,
offset = page_offset(bvec->bv_page) + bvec->bv_offset;
 
ordered = btrfs_lookup_ordered_extent(inode, offset);
-   BUG_ON(!ordered); /* Logic error */
+   ASSERT(ordered); /* Logic error */
sums->bytenr = (u64)bio->bi_iter.bi_sector << 9;
index = 0;
 
@@ -459,41 +470,55 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct 
inode *inode,
if (!contig)
offset = page_offset(bvec->bv_page) + bvec->bv_offset;
 
-   if (offset >= ordered->file_offset + ordered->len ||
-   offset < ordered->file_offset) {
-   unsigned long bytes_left;
-   sums->len = this_sum_bytes;
-   this_sum_bytes = 0;
-   btrfs_add_ordered_sum(inode, ordered, sums);
-   btrfs_put_ordered_extent(ordered);
+   data = kmap_atomic(bvec->bv_page);
 
-   bytes_left = bio->bi_iter.bi_size - total_bytes;
+   nr_sectors = (bvec->bv_len + root->sectorsize - 1)
+   >> inode->i_blkbits;
+
+   for (i = 0; i < nr_sectors; i++) {
+   if (offset >= ordered->file_offset + ordered->len ||
+   offset < ordered->file_offset) {
+   unsigned long bytes_left;
+
+   kunmap_atomic(data);
+   sums->len = this_sum_bytes;
+   this_sum_bytes = 0;
+   btrfs_add_ordered_sum(inode, ordered, sums);
+   btrfs_put_ordered_extent(ordered);
+
+   bytes_left = 

[RFC PATCH V4 00/13] Btrfs: Pre subpagesize-blocksize cleanups

2015-09-30 Thread Chandan Rajendra
The patches posted along with this cover letter are cleanups made
during the developement of subpagesize-blocksize patchset. I believe
that they can be integrated with the mainline kernel. Hence I have
posted them separately from the subpagesize-blocksize patchset.

I have testsed the patchset by running xfstests on ppc64 and
x86_64. On ppc64, some of the Btrfs specific tests and generic/255
fail because they assume 4K as the filesystem's block size. I have
fixed some of the test cases. I will fix the rest and mail them to the
fstests mailing list in the near future.

Changes from V3:
Two new issues have been been fixed by the patches,
1. Btrfs: prepare_pages: Retry adding a page to the page cache.
2. Btrfs: Return valid delalloc range when the page does not have
   PG_Dirty flag set or has been invalidated.
IMHO, The above issues are also applicable to the "page size == block
size" scenario but for reasons unknown to me they aren't seen even
when the tests are run for a long time.

Changes from V2:
1. For detecting logical errors, Use ASSERT() calls instead of calls to
   BUG_ON().
2. In the patch "Btrfs: Compute and look up csums based on sectorsized
   blocks", fix usage of kmap_atomic/kunmap_atomic such that between the
   kmap_atomic() and kunmap_atomic() calls we do not invoke any function
   that might cause the current task to sleep.
   
Changes from V1:
1. Call round_[down,up]() functions instead of doing hard coded alignment.


Chandan Rajendra (13):
  Btrfs: __btrfs_buffered_write: Reserve/release extents aligned to
block size
  Btrfs: Compute and look up csums based on sectorsized blocks
  Btrfs: Direct I/O read: Work on sectorsized blocks
  Btrfs: fallocate: Work with sectorsized blocks
  Btrfs: btrfs_page_mkwrite: Reserve space in sectorsized units
  Btrfs: Search for all ordered extents that could span across a page
  Btrfs: Use (eb->start, seq) as search key for tree modification log
  Btrfs: btrfs_submit_direct_hook: Handle map_length < bio vector length
  Btrfs: Limit inline extents to root->sectorsize
  Btrfs: Fix block size returned to user space
  Btrfs: Clean pte corresponding to page straddling i_size
  Btrfs: prepare_pages: Retry adding a page to the page cache
  Btrfs: Return valid delalloc range when the page does not have
PG_Dirty flag set or has been invalidated

 fs/btrfs/ctree.c |  34 
 fs/btrfs/ctree.h |   2 +-
 fs/btrfs/extent_io.c |   5 +-
 fs/btrfs/file-item.c |  93 
 fs/btrfs/file.c  | 119 +
 fs/btrfs/inode.c | 239 ---
 6 files changed, 331 insertions(+), 161 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH V4 11/13] Btrfs: Clean pte corresponding to page straddling i_size

2015-09-30 Thread Chandan Rajendra
When extending a file by either "truncate up" or by writing beyond i_size, the
page which had i_size needs to be marked "read only" so that future writes to
the page via mmap interface causes btrfs_page_mkwrite() to be invoked. If not,
a write performed after extending the file via the mmap interface will find
the page to be writaeable and continue writing to the page without invoking
btrfs_page_mkwrite() i.e. we end up writing to a file without reserving disk
space.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/file.c  | 12 ++--
 fs/btrfs/inode.c |  2 +-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 360d56d..5715e29 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1757,6 +1757,8 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
ssize_t err;
loff_t pos;
size_t count;
+   loff_t oldsize;
+   int clean_page = 0;
 
mutex_lock(>i_mutex);
err = generic_write_checks(iocb, from);
@@ -1795,14 +1797,17 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
pos = iocb->ki_pos;
count = iov_iter_count(from);
start_pos = round_down(pos, root->sectorsize);
-   if (start_pos > i_size_read(inode)) {
+   oldsize = i_size_read(inode);
+   if (start_pos > oldsize) {
/* Expand hole size to cover write data, preventing empty gap */
end_pos = round_up(pos + count, root->sectorsize);
-   err = btrfs_cont_expand(inode, i_size_read(inode), end_pos);
+   err = btrfs_cont_expand(inode, oldsize, end_pos);
if (err) {
mutex_unlock(>i_mutex);
goto out;
}
+   if (start_pos > round_up(oldsize, root->sectorsize))
+   clean_page = 1;
}
 
if (sync)
@@ -1814,6 +1819,9 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
num_written = __btrfs_buffered_write(file, from, pos);
if (num_written > 0)
iocb->ki_pos = pos + num_written;
+   if (clean_page)
+   pagecache_isize_extended(inode, oldsize,
+   i_size_read(inode));
}
 
mutex_unlock(>i_mutex);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index c937357..f31da87 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4853,7 +4853,6 @@ static int btrfs_setsize(struct inode *inode, struct 
iattr *attr)
}
 
if (newsize > oldsize) {
-   truncate_pagecache(inode, newsize);
/*
 * Don't do an expanding truncate while snapshoting is ongoing.
 * This is to ensure the snapshot captures a fully consistent
@@ -4876,6 +4875,7 @@ static int btrfs_setsize(struct inode *inode, struct 
iattr *attr)
 
i_size_write(inode, newsize);
btrfs_ordered_update_i_size(inode, i_size_read(inode), NULL);
+   pagecache_isize_extended(inode, oldsize, newsize);
ret = btrfs_update_inode(trans, root, inode);
btrfs_end_write_no_snapshoting(root);
btrfs_end_transaction(trans, root);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH V4 13/13] Btrfs: Return valid delalloc range when the page does not have PG_Dirty flag set or has been invalidated

2015-09-30 Thread Chandan Rajendra
The following issue was observed when running generic/095 test on
subpagesize-blocksize patchset.

Assume that we are trying to write a dirty page that is mapping file offset
range [159744, 163839].

writepage_delalloc()
  find_lock_delalloc_range(*start = 159744, *end = 0)
find_delalloc_range()
  Returns range [X, Y] where (X > 163839)
lock_delalloc_pages()
  One of the pages in range [X, Y] has dirty flag cleared;
  Loop once more restricting the delalloc range to span only
  PAGE_CACHE_SIZE bytes;
find_delalloc_range()
  Returns range [356352, 360447];
lock_delalloc_pages()
  The page [356352, 360447] has dirty flag cleared;
Returns with *start = 159744 and *end = 0;
  *start = *end + 1;
  find_lock_delalloc_range(*start = 1, *end = 0)
Finds and returns delalloc range [1, 12288];
  cow_file_range()
Clears delalloc range [1, 12288]
Create ordered extent for range [1, 12288]

The ordered extent thus created above breaks the rule that extents have to be
aligned to the filesystem's block size.

In cases where lock_delalloc_pages() fails (either due to PG_dirty flag being
cleared or the page no longer being a member of the inode's page cache), this
patch sets and returns the delalloc range that was found by
find_delalloc_range().

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/extent_io.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0ee486a..3912d1f 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1731,6 +1731,8 @@ again:
goto again;
} else {
found = 0;
+   *start = delalloc_start;
+   *end = delalloc_end;
goto out_failed;
}
}
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH V4 07/13] Btrfs: Use (eb->start, seq) as search key for tree modification log

2015-09-30 Thread Chandan Rajendra
In subpagesize-blocksize a page can map multiple extent buffers and hence
using (page index, seq) as the search key is incorrect. For example, searching
through tree modification log tree can return an entry associated with the
first extent buffer mapped by the page (if such an entry exists), when we are
actually searching for entries associated with extent buffers that are mapped
at position 2 or more in the page.

Reviewed-by: Liu Bo 
Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ctree.c | 34 +-
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 5f745ea..719ed3c 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -311,7 +311,7 @@ struct tree_mod_root {
 
 struct tree_mod_elem {
struct rb_node node;
-   u64 index;  /* shifted logical */
+   u64 logical;
u64 seq;
enum mod_log_op op;
 
@@ -435,11 +435,11 @@ void btrfs_put_tree_mod_seq(struct btrfs_fs_info *fs_info,
 
 /*
  * key order of the log:
- *   index -> sequence
+ *   node/leaf start address -> sequence
  *
- * the index is the shifted logical of the *new* root node for root replace
- * operations, or the shifted logical of the affected block for all other
- * operations.
+ * The 'start address' is the logical address of the *new* root node
+ * for root replace operations, or the logical address of the affected
+ * block for all other operations.
  *
  * Note: must be called with write lock (tree_mod_log_write_lock).
  */
@@ -460,9 +460,9 @@ __tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct 
tree_mod_elem *tm)
while (*new) {
cur = container_of(*new, struct tree_mod_elem, node);
parent = *new;
-   if (cur->index < tm->index)
+   if (cur->logical < tm->logical)
new = &((*new)->rb_left);
-   else if (cur->index > tm->index)
+   else if (cur->logical > tm->logical)
new = &((*new)->rb_right);
else if (cur->seq < tm->seq)
new = &((*new)->rb_left);
@@ -523,7 +523,7 @@ alloc_tree_mod_elem(struct extent_buffer *eb, int slot,
if (!tm)
return NULL;
 
-   tm->index = eb->start >> PAGE_CACHE_SHIFT;
+   tm->logical = eb->start;
if (op != MOD_LOG_KEY_ADD) {
btrfs_node_key(eb, >key, slot);
tm->blockptr = btrfs_node_blockptr(eb, slot);
@@ -588,7 +588,7 @@ tree_mod_log_insert_move(struct btrfs_fs_info *fs_info,
goto free_tms;
}
 
-   tm->index = eb->start >> PAGE_CACHE_SHIFT;
+   tm->logical = eb->start;
tm->slot = src_slot;
tm->move.dst_slot = dst_slot;
tm->move.nr_items = nr_items;
@@ -699,7 +699,7 @@ tree_mod_log_insert_root(struct btrfs_fs_info *fs_info,
goto free_tms;
}
 
-   tm->index = new_root->start >> PAGE_CACHE_SHIFT;
+   tm->logical = new_root->start;
tm->old_root.logical = old_root->start;
tm->old_root.level = btrfs_header_level(old_root);
tm->generation = btrfs_header_generation(old_root);
@@ -739,16 +739,15 @@ __tree_mod_log_search(struct btrfs_fs_info *fs_info, u64 
start, u64 min_seq,
struct rb_node *node;
struct tree_mod_elem *cur = NULL;
struct tree_mod_elem *found = NULL;
-   u64 index = start >> PAGE_CACHE_SHIFT;
 
tree_mod_log_read_lock(fs_info);
tm_root = _info->tree_mod_log;
node = tm_root->rb_node;
while (node) {
cur = container_of(node, struct tree_mod_elem, node);
-   if (cur->index < index) {
+   if (cur->logical < start) {
node = node->rb_left;
-   } else if (cur->index > index) {
+   } else if (cur->logical > start) {
node = node->rb_right;
} else if (cur->seq < min_seq) {
node = node->rb_left;
@@ -1230,9 +1229,10 @@ __tree_mod_log_oldest_root(struct btrfs_fs_info *fs_info,
return NULL;
 
/*
-* the very last operation that's logged for a root is the replacement
-* operation (if it is replaced at all). this has the index of the *new*
-* root, making it the very first operation that's logged for this root.
+* the very last operation that's logged for a root is the
+* replacement operation (if it is replaced at all). this has
+* the logical address of the *new* root, making it the very
+* first operation that's logged for this root.
 */
while (1) {
tm = tree_mod_log_search_oldest(fs_info, root_logical,
@@ -1336,7 +1336,7 @@ __tree_mod_log_rewind(struct btrfs_fs_info *fs_info, 
struct extent_buffer *eb,
if (!next)
break;
 

[RFC PATCH V4 04/13] Btrfs: fallocate: Work with sectorsized blocks

2015-09-30 Thread Chandan Rajendra
While at it, this commit changes btrfs_truncate_page() to truncate sectorsized
blocks instead of pages. Hence the function has been renamed to
btrfs_truncate_block().

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ctree.h |  2 +-
 fs/btrfs/file.c  | 47 +--
 fs/btrfs/inode.c | 52 +++-
 3 files changed, 53 insertions(+), 48 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 938efe3..99a0fff 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3893,7 +3893,7 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct inode *dir, u64 objectid,
const char *name, int name_len);
-int btrfs_truncate_page(struct inode *inode, loff_t from, loff_t len,
+int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
int front);
 int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
   struct btrfs_root *root,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 12ce401..360d56d 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2280,23 +2280,26 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
u64 tail_len;
u64 orig_start = offset;
u64 cur_offset;
+   unsigned char blocksize_bits;
u64 min_size = btrfs_calc_trunc_metadata_size(root, 1);
u64 drop_end;
int ret = 0;
int err = 0;
int rsv_count;
-   bool same_page;
+   bool same_block;
bool no_holes = btrfs_fs_incompat(root->fs_info, NO_HOLES);
u64 ino_size;
-   bool truncated_page = false;
+   bool truncated_block = false;
bool updated_inode = false;
 
+   blocksize_bits = inode->i_blkbits;
+
ret = btrfs_wait_ordered_range(inode, offset, len);
if (ret)
return ret;
 
mutex_lock(>i_mutex);
-   ino_size = round_up(inode->i_size, PAGE_CACHE_SIZE);
+   ino_size = round_up(inode->i_size, root->sectorsize);
ret = find_first_non_hole(inode, , );
if (ret < 0)
goto out_only_mutex;
@@ -2309,31 +2312,30 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
lockstart = round_up(offset, BTRFS_I(inode)->root->sectorsize);
lockend = round_down(offset + len,
 BTRFS_I(inode)->root->sectorsize) - 1;
-   same_page = ((offset >> PAGE_CACHE_SHIFT) ==
-   ((offset + len - 1) >> PAGE_CACHE_SHIFT));
-
+   same_block = ((offset >> blocksize_bits)
+   == ((offset + len - 1) >> blocksize_bits));
/*
-* We needn't truncate any page which is beyond the end of the file
+* We needn't truncate any block which is beyond the end of the file
 * because we are sure there is no data there.
 */
/*
-* Only do this if we are in the same page and we aren't doing the
-* entire page.
+* Only do this if we are in the same block and we aren't doing the
+* entire block.
 */
-   if (same_page && len < PAGE_CACHE_SIZE) {
+   if (same_block && len < root->sectorsize) {
if (offset < ino_size) {
-   truncated_page = true;
-   ret = btrfs_truncate_page(inode, offset, len, 0);
+   truncated_block = true;
+   ret = btrfs_truncate_block(inode, offset, len, 0);
} else {
ret = 0;
}
goto out_only_mutex;
}
 
-   /* zero back part of the first page */
+   /* zero back part of the first block */
if (offset < ino_size) {
-   truncated_page = true;
-   ret = btrfs_truncate_page(inode, offset, 0, 0);
+   truncated_block = true;
+   ret = btrfs_truncate_block(inode, offset, 0, 0);
if (ret) {
mutex_unlock(>i_mutex);
return ret;
@@ -2368,9 +2370,10 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
if (!ret) {
/* zero the front end of the last page */
if (tail_start + tail_len < ino_size) {
-   truncated_page = true;
-   ret = btrfs_truncate_page(inode,
-   tail_start + tail_len, 0, 1);
+   truncated_block = true;
+   ret = btrfs_truncate_block(inode,
+   tail_start + tail_len,
+   0, 1);
if (ret)
   

[RFC PATCH V4 06/13] Btrfs: Search for all ordered extents that could span across a page

2015-09-30 Thread Chandan Rajendra
In subpagesize-blocksize scenario it is not sufficient to search using the
first byte of the page to make sure that there are no ordered extents
present across the page. Fix this.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/extent_io.c |  3 ++-
 fs/btrfs/inode.c | 25 ++---
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 11aa8f7..0ee486a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3224,7 +3224,8 @@ static int __extent_read_full_page(struct extent_io_tree 
*tree,
 
while (1) {
lock_extent(tree, start, end);
-   ordered = btrfs_lookup_ordered_extent(inode, start);
+   ordered = btrfs_lookup_ordered_range(inode, start,
+   PAGE_CACHE_SIZE);
if (!ordered)
break;
unlock_extent(tree, start, end);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5e6052d..4fbe9de 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1975,7 +1975,8 @@ again:
if (PagePrivate2(page))
goto out;
 
-   ordered = btrfs_lookup_ordered_extent(inode, page_start);
+   ordered = btrfs_lookup_ordered_range(inode, page_start,
+   PAGE_CACHE_SIZE);
if (ordered) {
unlock_extent_cached(_I(inode)->io_tree, page_start,
 page_end, _state, GFP_NOFS);
@@ -8552,6 +8553,8 @@ static void btrfs_invalidatepage(struct page *page, 
unsigned int offset,
struct extent_state *cached_state = NULL;
u64 page_start = page_offset(page);
u64 page_end = page_start + PAGE_CACHE_SIZE - 1;
+   u64 start;
+   u64 end;
int inode_evicting = inode->i_state & I_FREEING;
 
/*
@@ -8571,14 +8574,18 @@ static void btrfs_invalidatepage(struct page *page, 
unsigned int offset,
 
if (!inode_evicting)
lock_extent_bits(tree, page_start, page_end, 0, _state);
-   ordered = btrfs_lookup_ordered_extent(inode, page_start);
+again:
+   start = page_start;
+   ordered = btrfs_lookup_ordered_range(inode, start,
+   page_end - start + 1);
if (ordered) {
+   end = min(page_end, ordered->file_offset + ordered->len - 1);
/*
 * IO on this page will never be started, so we need
 * to account for any ordered extents now
 */
if (!inode_evicting)
-   clear_extent_bit(tree, page_start, page_end,
+   clear_extent_bit(tree, start, end,
 EXTENT_DIRTY | EXTENT_DELALLOC |
 EXTENT_LOCKED | EXTENT_DO_ACCOUNTING |
 EXTENT_DEFRAG, 1, 0, _state,
@@ -8595,22 +8602,26 @@ static void btrfs_invalidatepage(struct page *page, 
unsigned int offset,
 
spin_lock_irq(>lock);
set_bit(BTRFS_ORDERED_TRUNCATED, >flags);
-   new_len = page_start - ordered->file_offset;
+   new_len = start - ordered->file_offset;
if (new_len < ordered->truncated_len)
ordered->truncated_len = new_len;
spin_unlock_irq(>lock);
 
if (btrfs_dec_test_ordered_pending(inode, ,
-  page_start,
-  PAGE_CACHE_SIZE, 1))
+  start,
+  end - start + 1, 1))
btrfs_finish_ordered_io(ordered);
}
btrfs_put_ordered_extent(ordered);
if (!inode_evicting) {
cached_state = NULL;
-   lock_extent_bits(tree, page_start, page_end, 0,
+   lock_extent_bits(tree, start, end, 0,
 _state);
}
+
+   start = end + 1;
+   if (start < page_end)
+   goto again;
}
 
if (!inode_evicting) {
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH V4 12/13] Btrfs: prepare_pages: Retry adding a page to the page cache

2015-09-30 Thread Chandan Rajendra
When reading the page from the disk, we can race with Direct I/O which can get
the page lock (before prepare_uptodate_page() gets it) and can go ahead and
invalidate the page. Hence if the page is not found in the inode's address
space, retry the operation of getting a page.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/file.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 5715e29..76db77c 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1316,6 +1316,7 @@ static noinline int prepare_pages(struct inode *inode, 
struct page **pages,
int faili;
 
for (i = 0; i < num_pages; i++) {
+again:
pages[i] = find_or_create_page(inode->i_mapping, index + i,
   mask | __GFP_WRITE);
if (!pages[i]) {
@@ -1330,6 +1331,21 @@ static noinline int prepare_pages(struct inode *inode, 
struct page **pages,
if (i == num_pages - 1)
err = prepare_uptodate_page(pages[i],
pos + write_bytes, false);
+
+   /*
+* When reading the page from the disk, we can race
+* with direct i/o which can get the page lock (before
+* prepare_uptodate_page() gets it) and can go ahead
+* and invalidate the page. Hence if the page is found
+* to be not belonging to the inode's address space,
+* retry the operation of getting a page.
+*/
+   if (unlikely(pages[i]->mapping != inode->i_mapping)) {
+   unlock_page(pages[i]);
+   page_cache_release(pages[i]);
+   goto again;
+   }
+
if (err) {
page_cache_release(pages[i]);
faili = i - 1;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH V4 01/13] Btrfs: __btrfs_buffered_write: Reserve/release extents aligned to block size

2015-09-30 Thread Chandan Rajendra
Currently, the code reserves/releases extents in multiples of PAGE_CACHE_SIZE
units. Fix this by doing reservation/releases in block size units.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/file.c | 44 +++-
 1 file changed, 31 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index b823fac..12ce401 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -499,7 +499,7 @@ int btrfs_dirty_pages(struct btrfs_root *root, struct inode 
*inode,
loff_t isize = i_size_read(inode);
 
start_pos = pos & ~((u64)root->sectorsize - 1);
-   num_bytes = ALIGN(write_bytes + pos - start_pos, root->sectorsize);
+   num_bytes = round_up(write_bytes + pos - start_pos, root->sectorsize);
 
end_of_last_block = start_pos + num_bytes - 1;
err = btrfs_set_extent_delalloc(inode, start_pos, end_of_last_block,
@@ -1362,16 +1362,19 @@ fail:
 static noinline int
 lock_and_cleanup_extent_if_need(struct inode *inode, struct page **pages,
size_t num_pages, loff_t pos,
+   size_t write_bytes,
u64 *lockstart, u64 *lockend,
struct extent_state **cached_state)
 {
+   struct btrfs_root *root = BTRFS_I(inode)->root;
u64 start_pos;
u64 last_pos;
int i;
int ret = 0;
 
-   start_pos = pos & ~((u64)PAGE_CACHE_SIZE - 1);
-   last_pos = start_pos + ((u64)num_pages << PAGE_CACHE_SHIFT) - 1;
+   start_pos = round_down(pos, root->sectorsize);
+   last_pos = start_pos
+   + round_up(pos + write_bytes - start_pos, root->sectorsize) - 1;
 
if (start_pos < inode->i_size) {
struct btrfs_ordered_extent *ordered;
@@ -1489,6 +1492,7 @@ static noinline ssize_t __btrfs_buffered_write(struct 
file *file,
 
while (iov_iter_count(i) > 0) {
size_t offset = pos & (PAGE_CACHE_SIZE - 1);
+   size_t sector_offset;
size_t write_bytes = min(iov_iter_count(i),
 nrptrs * (size_t)PAGE_CACHE_SIZE -
 offset);
@@ -1497,6 +1501,8 @@ static noinline ssize_t __btrfs_buffered_write(struct 
file *file,
size_t reserve_bytes;
size_t dirty_pages;
size_t copied;
+   size_t dirty_sectors;
+   size_t num_sectors;
 
WARN_ON(num_pages > nrptrs);
 
@@ -1509,8 +1515,12 @@ static noinline ssize_t __btrfs_buffered_write(struct 
file *file,
break;
}
 
-   reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
+   sector_offset = pos & (root->sectorsize - 1);
+   reserve_bytes = round_up(write_bytes + sector_offset,
+   root->sectorsize);
+
ret = btrfs_check_data_free_space(inode, reserve_bytes, 
write_bytes);
+
if (ret == -ENOSPC &&
(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
  BTRFS_INODE_PREALLOC))) {
@@ -1523,7 +1533,10 @@ static noinline ssize_t __btrfs_buffered_write(struct 
file *file,
 */
num_pages = DIV_ROUND_UP(write_bytes + offset,
 PAGE_CACHE_SIZE);
-   reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
+   reserve_bytes = round_up(write_bytes
+   + sector_offset,
+   root->sectorsize);
+
ret = 0;
} else {
ret = -ENOSPC;
@@ -1558,8 +1571,8 @@ again:
break;
 
ret = lock_and_cleanup_extent_if_need(inode, pages, num_pages,
- pos, , ,
- _state);
+   pos, write_bytes, ,
+   , _state);
if (ret < 0) {
if (ret == -EAGAIN)
goto again;
@@ -1595,9 +1608,14 @@ again:
 * we still have an outstanding extent for the chunk we actually
 * managed to copy.
 */
-   if (num_pages > dirty_pages) {
-   release_bytes = (num_pages - dirty_pages) <<
-   PAGE_CACHE_SHIFT;
+   num_sectors = reserve_bytes >> inode->i_blkbits;
+   dirty_sectors = round_up(copied + sector_offset,
+   root->sectorsize);
+   dirty_sectors >>= 

[PATCH V5 00/13] Btrfs: Pre subpagesize-blocksize cleanups

2015-09-30 Thread Chandan Rajendra
The patches posted along with this cover letter are cleanups made
during the developement of subpagesize-blocksize patchset. I believe
that they can be integrated with the mainline kernel. Hence I have
posted them separately from the subpagesize-blocksize patchset.

I have testsed the patchset by running xfstests on ppc64 and
x86_64. On ppc64, some of the Btrfs specific tests and generic/255
fail because they assume 4K as the filesystem's block size. I have
fixed some of the test cases. I will fix the rest and mail them to the
fstests mailing list in the near future.

Changes from V4:
1. Removed the RFC tag.

Changes from V3:
Two new issues have been been fixed by the patches,
1. Btrfs: prepare_pages: Retry adding a page to the page cache.
2. Btrfs: Return valid delalloc range when the page does not have
   PG_Dirty flag set or has been invalidated.
IMHO, The above issues are also applicable to the "page size == block
size" scenario but for reasons unknown to me they aren't seen even
when the tests are run for a long time.

Changes from V2:
1. For detecting logical errors, Use ASSERT() calls instead of calls to
   BUG_ON().
2. In the patch "Btrfs: Compute and look up csums based on sectorsized
   blocks", fix usage of kmap_atomic/kunmap_atomic such that between the
   kmap_atomic() and kunmap_atomic() calls we do not invoke any function
   that might cause the current task to sleep.
   
Changes from V1:
1. Call round_[down,up]() functions instead of doing hard coded alignment.

Chandan Rajendra (13):
  Btrfs: __btrfs_buffered_write: Reserve/release extents aligned to
block size
  Btrfs: Compute and look up csums based on sectorsized blocks
  Btrfs: Direct I/O read: Work on sectorsized blocks
  Btrfs: fallocate: Work with sectorsized blocks
  Btrfs: btrfs_page_mkwrite: Reserve space in sectorsized units
  Btrfs: Search for all ordered extents that could span across a page
  Btrfs: Use (eb->start, seq) as search key for tree modification log
  Btrfs: btrfs_submit_direct_hook: Handle map_length < bio vector length
  Btrfs: Limit inline extents to root->sectorsize
  Btrfs: Fix block size returned to user space
  Btrfs: Clean pte corresponding to page straddling i_size
  Btrfs: prepare_pages: Retry adding a page to the page cache
  Btrfs: Return valid delalloc range when the page does not have
PG_Dirty flag set or has been invalidated

 fs/btrfs/ctree.c |  34 
 fs/btrfs/ctree.h |   2 +-
 fs/btrfs/extent_io.c |   5 +-
 fs/btrfs/file-item.c |  93 
 fs/btrfs/file.c  | 119 +
 fs/btrfs/inode.c | 239 ---
 6 files changed, 331 insertions(+), 161 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] fstests: generic: Test that fsync works on file in overlayfs merged directory

2015-09-30 Thread Dave Chinner
On Wed, Sep 30, 2015 at 10:57:45PM +0300, Roman Lebedev wrote:
> As per overlayfs documentation, any activity on a merged directory
> for a application that is doing such activity should work exactly
> as if that would be a normal, non overlayfs-merged directory.
> 
> That is, e.g. simple fopen-fwrite-fsync-fclose sequence should
> work just fine.

We have plenty of tests that do things like that.

> But apparently it does not. Add a simple generic test to check that.
> As of right now (linux-4.2.1) this test fails at least on btrfs.
> 
> PS: An alternative (and probably better approach) would be to run
> fstests test suite with TEST_DIR set to overlayfs work directory.

Much better is to run xfstests directly on overlayfs. THere have
been some patches to do that posted in the past, but those patches
and discussions kinda ended up going nowhere:

http://www.mail-archive.com/fstests@vger.kernel.org/msg00474.html

Perhaps you'd like to pick this up, and then overlay will by much
easier to test and hence likely not to have bugs like this...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] fstests: generic: Test that fsync works on file in overlayfs merged directory

2015-09-30 Thread Eric Sandeen


On 9/30/15 4:56 PM, Dave Chinner wrote:
> On Wed, Sep 30, 2015 at 10:57:45PM +0300, Roman Lebedev wrote:
>> As per overlayfs documentation, any activity on a merged directory
>> for a application that is doing such activity should work exactly
>> as if that would be a normal, non overlayfs-merged directory.
>>
>> That is, e.g. simple fopen-fwrite-fsync-fclose sequence should
>> work just fine.
> 
> We have plenty of tests that do things like that.
> 
>> But apparently it does not. Add a simple generic test to check that.
>> As of right now (linux-4.2.1) this test fails at least on btrfs.
>>
>> PS: An alternative (and probably better approach) would be to run
>> fstests test suite with TEST_DIR set to overlayfs work directory.
> 
> Much better is to run xfstests directly on overlayfs. THere have
> been some patches to do that posted in the past, but those patches
> and discussions kinda ended up going nowhere:
> 
> http://www.mail-archive.com/fstests@vger.kernel.org/msg00474.html
> 
> Perhaps you'd like to pick this up, and then overlay will by much
> easier to test and hence likely not to have bugs like this...

Yeah, that could still be used for fun, but Zach's POV was that
we should just have a specific overlayfs config (dictating paths
to over/under/merge/around/through/whatever directories), a special
mount_overlayfs helper, etc, ala NFS & CIFS.  It may actually be
easier than what I proposed.

If you want to take a stab at it I'm happy to help, answer questions,
etc - I'm not sure when I'll get back to it...

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 doesn't mount on boot, but you can afterwards?

2015-09-30 Thread Duncan
Sjoerd posted on Wed, 30 Sep 2015 18:49:21 +0200 as excerpted:

> A RAID5 setup on raw devices doesn't want to automount on boot. After I
> skip mounting I can log in (Ubuntu server 14.04 on kernel 4.1.8) and
> just do a "sudo mount -a" to get all mounted fine. So the array doesn't
> seem to be broken. "btrfs fi show /data" doesn't show anything wrong
> either.
> 
> The only weird thing I see in the syslog is :
> 
> BTRFS info (device sdd): disk space caching is enabled BTRFS: has skinny
> extents BTRFS: failed to read the system array on sdd BTRFS: open_ctree
> failed
>  
> If I reboot the machine the drive in the log changed and looks random
> (i've seen in 3 boots sda, sdc and sde passing by)
> 
> I am using btrfs-progs 4.2.1 if that matters in this case...
> 
> Anyone have a clue whyt it's not automounting? Or something I can do to
> troubleshoot?

That's very likely because unlike traditional single-device filesystems 
(including single-device btrfs), multi-device btrfs has multiple devices 
it must know about before it can mount the device, while mount only feeds 
it one device.

There are two ways to tell btrfs (the kernel side) about the other 
devices.

1) Do a btrfs device scan before trying to mount.

2) Name the component devices in the mount options, using the device= 
option (multiple times as necessary to list all devices).

For various reasons including dynamic device discovery effectively 
randomizing device sd* assignment, btrfs device scan is the normally used 
option.

What's probably happening is that at some point in the boot process, 
btrfs device scan is being automatically run, but it's after the attempt 
to mount the filesystem during boot, so the boot attempt to mount fails, 
but doing a manual mount succeeds, because the scan has already been done 
by the time you get a prompt in ordered to run the command.

So what you need to do is find the service that runs the btrfs device 
scan, and make the mount depend on it, so the scan is done before the 
attempt to mount.  Then it should work.

Or if it's easier, simply create a new service that runs the scan, and 
have it run before the mount, since rerunning the scan twice won't hurt 
anything, it simply needs to run before the mount is attempted in ordered 
for btrfs to know what devices compose the filesystem, so it can be 
mounted.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 5/9] vfs: Copy shouldn't forbid ranges inside the same file

2015-09-30 Thread Anna Schumaker
This is perfectly valid for BTRFS and XFS, so let's leave this up to
filesystems to check.

Signed-off-by: Anna Schumaker 
Reviewed-by: David Sterba 
Reviewed-by: Darrick J. Wong 
---
 fs/read_write.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index f3d6c48..8e7cb33 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1371,10 +1371,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t 
pos_in,
file_in->f_path.mnt != file_out->f_path.mnt)
return -EXDEV;
 
-   /* forbid ranges in the same file */
-   if (inode_in == inode_out)
-   return -EINVAL;
-
if (len == 0)
return 0;
 
-- 
2.6.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 9/9] btrfs: btrfs_copy_file_range() only supports reflinks

2015-09-30 Thread Anna Schumaker
Reject copies that don't have the COPY_FR_REFLINK flag set.

Signed-off-by: Anna Schumaker 
Reviewed-by: David Sterba 
---
 fs/btrfs/ioctl.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index d3697e8..c1f115d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ctree.h"
 #include "disk-io.h"
 #include "transaction.h"
@@ -3848,6 +3849,9 @@ ssize_t btrfs_copy_file_range(struct file *file_in, 
loff_t pos_in,
 {
ssize_t ret;
 
+   if (!(flags & COPY_FR_REFLINK))
+   return -EOPNOTSUPP;
+
ret = btrfs_clone_files(file_out, file_in, pos_in, len, pos_out);
if (ret == 0)
ret = len;
-- 
2.6.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 4/9] vfs: Copy should check len after file open mode

2015-09-30 Thread Anna Schumaker
I don't think it makes sense to report that a copy succeeded if the
files aren't open properly.

Signed-off-by: Anna Schumaker 
Reviewed-by: David Sterba 
---
 fs/read_write.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index dd10750..f3d6c48 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1345,9 +1345,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t 
pos_in,
if (flags)
return -EINVAL;
 
-   if (len == 0)
-   return 0;
-
/* copy_file_range allows full ssize_t len, ignoring MAX_RW_COUNT  */
ret = rw_verify_area(READ, file_in, _in, len);
if (ret >= 0)
@@ -1378,6 +1375,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t 
pos_in,
if (inode_in == inode_out)
return -EINVAL;
 
+   if (len == 0)
+   return 0;
+
ret = mnt_want_write_file(file_out);
if (ret)
return ret;
-- 
2.6.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 10/9] copy_file_range.2: New page documenting copy_file_range()

2015-09-30 Thread Anna Schumaker
copy_file_range() is a new system call for copying ranges of data
completely in the kernel.  This gives filesystems an opportunity to
implement some kind of "copy acceleration", such as reflinks or
server-side-copy (in the case of NFS).

Signed-off-by: Anna Schumaker 
Reviewed-by: Darrick J. Wong 
---
 man2/copy_file_range.2 | 224 +
 man2/splice.2  |   1 +
 2 files changed, 225 insertions(+)
 create mode 100644 man2/copy_file_range.2

diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
new file mode 100644
index 000..23e3875
--- /dev/null
+++ b/man2/copy_file_range.2
@@ -0,0 +1,224 @@
+.\"This manpage is Copyright (C) 2015 Anna Schumaker 

+.\"
+.\" %%%LICENSE_START(VERBATIM)
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\"
+.\" Permission is granted to copy and distribute modified versions of
+.\" this manual under the conditions for verbatim copying, provided that
+.\" the entire resulting derived work is distributed under the terms of
+.\" a permission notice identical to this one.
+.\"
+.\" Since the Linux kernel and libraries are constantly changing, this
+.\" manual page may be incorrect or out-of-date.  The author(s) assume.
+.\" no responsibility for errors or omissions, or for damages resulting.
+.\" from the use of the information contained herein.  The author(s) may.
+.\" not have taken the same level of care in the production of this.
+.\" manual, which is licensed free of charge, as they might when working.
+.\" professionally.
+.\"
+.\" Formatted or processed versions of this manual, if unaccompanied by
+.\" the source, must acknowledge the copyright and authors of this work.
+.\" %%%LICENSE_END
+.\"
+.TH COPY 2 2015-09-29 "Linux" "Linux Programmer's Manual"
+.SH NAME
+copy_file_range \- Copy a range of data from one file to another
+.SH SYNOPSIS
+.nf
+.B #include 
+.B #include 
+.B #include 
+
+.BI "ssize_t copy_file_range(int " fd_in ", loff_t *" off_in ", int " fd_out ",
+.BI "loff_t *" off_out ", size_t " len \
+", unsigned int " flags );
+.fi
+.SH DESCRIPTION
+The
+.BR copy_file_range ()
+system call performs an in-kernel copy between two file descriptors
+without the additional cost of transferring data from the kernel to userspace
+and then back into the kernel.
+It copies up to
+.I len
+bytes of data from file descriptor
+.I fd_in
+to file descriptor
+.IR fd_out ,
+overwriting any data that exists within the requested range of the target file.
+
+The following semantics apply for
+.IR off_in ,
+and similar statements apply to
+.IR off_out :
+.IP * 3
+If
+.I off_in
+is NULL, then bytes are read from
+.I fd_in
+starting from the current file offset, and the offset is
+adjusted by the number of bytes copied.
+.IP *
+If
+.I off_in
+is not NULL, then
+.I off_in
+must point to a buffer that specifies the starting
+offset where bytes from
+.I fd_in
+will be read.  The current file offset of
+.I fd_in
+is not changed, but
+.I off_in
+is adjusted appropriately.
+.PP
+
+The
+.I flags
+argument can have one of the following flags set:
+.TP 1.9i
+.B COPY_FR_COPY
+Copy all the file data in the requested range.
+Some filesystems might be able to accelerate this copy
+to avoid unnecessary data transfers.
+.TP
+.B COPY_FR_REFLINK
+Create a lightweight "reflink", where data is not copied until
+one of the files is modified.
+.TP
+.B COPY_FR_DEDUP
+Create a reflink, but only if the contents of
+both files' byte ranges are identical.
+If ranges do not match,
+.B EILSEQ
+will be returned.
+.PP
+The default behavior
+.RI ( flags
+== 0) is to try creating a reflink,
+and if reflinking fails
+.BR copy_file_range ()
+will fall back to performing a full data copy.
+.SH RETURN VALUE
+Upon successful completion,
+.BR copy_file_range ()
+will return the number of bytes copied between files.
+This could be less than the length originally requested.
+
+On error,
+.BR copy_file_range ()
+returns \-1 and
+.I errno
+is set to indicate the error.
+.SH ERRORS
+.TP
+.B EBADF
+One or more file descriptors are not valid; or
+.I fd_in
+is not open for reading; or
+.I fd_out
+is not open for writing.
+.TP
+.B EILSEQ
+The contents of both files' byte ranges did not match.
+.TP
+.B EINVAL
+Requested range extends beyond the end of the source file; or the
+.I flags
+argument is set to an invalid value.
+.TP
+.B EIO
+A low level I/O error occurred while copying.
+.TP
+.B ENOMEM
+Out of memory.
+.TP
+.B ENOSPC
+There is not enough space on the target filesystem to complete the copy.
+.TP
+.B EOPNOTSUPP
+.B COPY_REFLINK
+or
+.B COPY_DEDUP
+was specified in
+.IR flags ,
+but the target filesystem does not support the given operation.
+.TP
+.B EXDEV
+Target filesystem doesn't support cross-filesystem copies.
+.SH VERSIONS
+The
+.BR 

[PATCH v5 1/9] vfs: add copy_file_range syscall and vfs helper

2015-09-30 Thread Anna Schumaker
From: Zach Brown 

Add a copy_file_range() system call for offloading copies between
regular files.

This gives an interface to underlying layers of the storage stack which
can copy without reading and writing all the data.  There are a few
candidates that should support copy offloading in the nearer term:

- btrfs shares extent references with its clone ioctl
- NFS has patches to add a COPY command which copies on the server
- SCSI has a family of XCOPY commands which copy in the device

This system call avoids the complexity of also accelerating the creation
of the destination file by operating on an existing destination file
descriptor, not a path.

Currently the high level vfs entry point limits copy offloading to files
on the same mount and super (and not in the same file).  This can be
relaxed if we get implementations which can copy between file systems
safely.

Signed-off-by: Zach Brown 
[Anna Schumaker: Change -EINVAL to -EBADF during file verification]
[Anna Schumaker: Change flags parameter from int to unsigned int]
[Anna Schumaker: Add function to include/linux/syscalls.h]
Signed-off-by: Anna Schumaker 
---
v5:
- Bump syscall number again
- Add to include/linux/syscalls.h
---
 fs/read_write.c   | 129 ++
 include/linux/fs.h|   3 +
 include/linux/syscalls.h  |   3 +
 include/uapi/asm-generic/unistd.h |   4 +-
 kernel/sys_ni.c   |   1 +
 5 files changed, 139 insertions(+), 1 deletion(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 819ef3f..dd10750 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 #include 
@@ -1327,3 +1328,131 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, 
in_fd,
return do_sendfile(out_fd, in_fd, NULL, count, 0);
 }
 #endif
+
+/*
+ * copy_file_range() differs from regular file read and write in that it
+ * specifically allows return partial success.  When it does so is up to
+ * the copy_file_range method.
+ */
+ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
+   struct file *file_out, loff_t pos_out,
+   size_t len, unsigned int flags)
+{
+   struct inode *inode_in;
+   struct inode *inode_out;
+   ssize_t ret;
+
+   if (flags)
+   return -EINVAL;
+
+   if (len == 0)
+   return 0;
+
+   /* copy_file_range allows full ssize_t len, ignoring MAX_RW_COUNT  */
+   ret = rw_verify_area(READ, file_in, _in, len);
+   if (ret >= 0)
+   ret = rw_verify_area(WRITE, file_out, _out, len);
+   if (ret < 0)
+   return ret;
+
+   if (!(file_in->f_mode & FMODE_READ) ||
+   !(file_out->f_mode & FMODE_WRITE) ||
+   (file_out->f_flags & O_APPEND) ||
+   !file_in->f_op || !file_in->f_op->copy_file_range)
+   return -EBADF;
+
+   inode_in = file_inode(file_in);
+   inode_out = file_inode(file_out);
+
+   /* make sure offsets don't wrap and the input is inside i_size */
+   if (pos_in + len < pos_in || pos_out + len < pos_out ||
+   pos_in + len > i_size_read(inode_in))
+   return -EINVAL;
+
+   /* this could be relaxed once a method supports cross-fs copies */
+   if (inode_in->i_sb != inode_out->i_sb ||
+   file_in->f_path.mnt != file_out->f_path.mnt)
+   return -EXDEV;
+
+   /* forbid ranges in the same file */
+   if (inode_in == inode_out)
+   return -EINVAL;
+
+   ret = mnt_want_write_file(file_out);
+   if (ret)
+   return ret;
+
+   ret = file_in->f_op->copy_file_range(file_in, pos_in, file_out, pos_out,
+len, flags);
+   if (ret > 0) {
+   fsnotify_access(file_in);
+   add_rchar(current, ret);
+   fsnotify_modify(file_out);
+   add_wchar(current, ret);
+   }
+   inc_syscr(current);
+   inc_syscw(current);
+
+   mnt_drop_write_file(file_out);
+
+   return ret;
+}
+EXPORT_SYMBOL(vfs_copy_file_range);
+
+SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
+   int, fd_out, loff_t __user *, off_out,
+   size_t, len, unsigned int, flags)
+{
+   loff_t pos_in;
+   loff_t pos_out;
+   struct fd f_in;
+   struct fd f_out;
+   ssize_t ret;
+
+   f_in = fdget(fd_in);
+   f_out = fdget(fd_out);
+   if (!f_in.file || !f_out.file) {
+   ret = -EBADF;
+   goto out;
+   }
+
+   ret = -EFAULT;
+   if (off_in) {
+   if (copy_from_user(_in, off_in, sizeof(loff_t)))
+   goto out;
+   } else {
+   pos_in = f_in.file->f_pos;
+   }
+
+   if (off_out) {
+   if 

[PATCH v5 8/9] vfs: Add vfs_copy_file_range() support for pagecache copies

2015-09-30 Thread Anna Schumaker
This allows us to have an in-kernel copy mechanism that avoids frequent
switches between kernel and user space.  This is especially useful so
NFSD can support server-side copies.

I make pagecache copies configurable by adding three new (exclusive)
flags:
- COPY_FR_REFLINK tells vfs_copy_file_range() to only create a reflink.
- COPY_FR_COPY does a full data copy, but may be filesystem accelerated.
- COPY_FR_DEDUP creates a reflink, but only if the contents of both
  ranges are identical.

The default (flags=0) means to first attempt a reflink, but use the pagecache
if that fails.

I moved the rw_verify_area() calls into the fallback code since some
filesystems can handle reflinking a large range.

Signed-off-by: Anna Schumaker 
Reviewed-by: Darrick J. Wong 
Reviewed-by: Padraig Brady 
---
 fs/read_write.c   | 61 +++
 include/linux/copy.h  |  6 +
 include/uapi/linux/Kbuild |  1 +
 include/uapi/linux/copy.h |  8 +++
 4 files changed, 56 insertions(+), 20 deletions(-)
 create mode 100644 include/linux/copy.h
 create mode 100644 include/uapi/linux/copy.h

diff --git a/fs/read_write.c b/fs/read_write.c
index ee9fa37..4fb9b8e 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -7,6 +7,7 @@
 #include  
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1329,6 +1330,29 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, 
in_fd,
 }
 #endif
 
+static ssize_t vfs_copy_file_pagecache(struct file *file_in, loff_t pos_in,
+  struct file *file_out, loff_t pos_out,
+  size_t len)
+{
+   ssize_t ret;
+
+   ret = rw_verify_area(READ, file_in, _in, len);
+   if (ret >= 0) {
+   len = ret;
+   ret = rw_verify_area(WRITE, file_out, _out, len);
+   if (ret >= 0)
+   len = ret;
+   }
+   if (ret < 0)
+   return ret;
+
+   file_start_write(file_out);
+   ret = do_splice_direct(file_in, _in, file_out, _out, len, 0);
+   file_end_write(file_out);
+
+   return ret;
+}
+
 /*
  * copy_file_range() differs from regular file read and write in that it
  * specifically allows return partial success.  When it does so is up to
@@ -1338,34 +1362,26 @@ ssize_t vfs_copy_file_range(struct file *file_in, 
loff_t pos_in,
struct file *file_out, loff_t pos_out,
size_t len, unsigned int flags)
 {
-   struct inode *inode_in;
-   struct inode *inode_out;
ssize_t ret;
 
-   if (flags)
+   /* Flags should only be used exclusively. */
+   if ((flags & COPY_FR_COPY) && (flags & ~COPY_FR_COPY))
+   return -EINVAL;
+   if ((flags & COPY_FR_REFLINK) && (flags & ~COPY_FR_REFLINK))
+   return -EINVAL;
+   if ((flags & COPY_FR_DEDUP) && (flags & ~COPY_FR_DEDUP))
return -EINVAL;
 
-   /* copy_file_range allows full ssize_t len, ignoring MAX_RW_COUNT  */
-   ret = rw_verify_area(READ, file_in, _in, len);
-   if (ret >= 0)
-   ret = rw_verify_area(WRITE, file_out, _out, len);
-   if (ret < 0)
-   return ret;
+   /* Default behavior is to try both. */
+   if (flags == 0)
+   flags = COPY_FR_COPY | COPY_FR_REFLINK;
 
if (!(file_in->f_mode & FMODE_READ) ||
!(file_out->f_mode & FMODE_WRITE) ||
(file_out->f_flags & O_APPEND) ||
-   !file_out->f_op || !file_out->f_op->copy_file_range)
+   !file_out->f_op)
return -EBADF;
 
-   inode_in = file_inode(file_in);
-   inode_out = file_inode(file_out);
-
-   /* make sure offsets don't wrap and the input is inside i_size */
-   if (pos_in + len < pos_in || pos_out + len < pos_out ||
-   pos_in + len > i_size_read(inode_in))
-   return -EINVAL;
-
if (len == 0)
return 0;
 
@@ -1373,8 +1389,13 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t 
pos_in,
if (ret)
return ret;
 
-   ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out, 
pos_out,
- len, flags);
+   ret = -EOPNOTSUPP;
+   if (file_out->f_op->copy_file_range && (file_in->f_op == 
file_out->f_op))
+   ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
+ pos_out, len, flags);
+   if ((ret < 0) && (flags & COPY_FR_COPY))
+   ret = vfs_copy_file_pagecache(file_in, pos_in, file_out,
+ pos_out, len);
if (ret > 0) {
fsnotify_access(file_in);
add_rchar(current, ret);
diff --git a/include/linux/copy.h b/include/linux/copy.h
new file mode 100644
index 

[PATCH v5 7/9] vfs: Remove copy_file_range mountpoint checks

2015-09-30 Thread Anna Schumaker
I still want to do an in-kernel copy even if the files are on different
mountpoints, and NFS has a "server to server" copy that expects two
files on different mountpoints.  Let's have individual filesystems
implement this check instead.

Signed-off-by: Anna Schumaker 
Reviewed-by: David Sterba 
---
 fs/read_write.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 6f74f1f..ee9fa37 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1366,11 +1366,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t 
pos_in,
pos_in + len > i_size_read(inode_in))
return -EINVAL;
 
-   /* this could be relaxed once a method supports cross-fs copies */
-   if (inode_in->i_sb != inode_out->i_sb ||
-   file_in->f_path.mnt != file_out->f_path.mnt)
-   return -EXDEV;
-
if (len == 0)
return 0;
 
-- 
2.6.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: fix a compiler warning of may be used uninitialized

2015-09-30 Thread David Sterba
On Wed, Sep 30, 2015 at 11:55:13AM +0800, Zhao Lei wrote:
> > AFAICS the codepath that would use uninitialized value of count is not
> > reachable:
> > 
> > add_to_ctl = true
> > 
> > 270 if (info->offset > root->ino_cache_progress)
> > 271 add_to_ctl = false;
> > 272 else if (info->offset + info->bytes >
> > root->ino_cache_progress)
> > 273 count = root->ino_cache_progress -
> > info->offset + 1;
> > 274 else
> > 275 count = info->bytes;
> > 276
> > 277 rb_erase(>offset_index, rbroot);
> > 278 spin_unlock(rbroot_lock);
> > 279 if (add_to_ctl)
> > 280 __btrfs_add_free_space(ctl, info->offset,
> > count);
> > 
> > count is defined iff add_to_ctl == true, so the patch is not necessary. And 
> > I'm
> > not quite sure that 0 passed down to __btrfs_add_free_space as 'bytes' makes
> > sense at all.
> 
> Agree above all.
> 
> So I write following description in changelog:
>   "Not real problem, just avoid warning of: ..."
> 
> It is just to avoid complier warning, no function changed.
> A warning in compiler output is not pretty:)

And the compiler is wrong in this case, the code is fine as is. I'd say
go fix you compiler and the output will be pretty :) No really, this
kind of fixes brings false sense of "fixing something in the code".
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/23] btrfs device related patch set

2015-09-30 Thread David Sterba
On Wed, Sep 30, 2015 at 06:10:53AM +0800, Anand Jain wrote:
> 
> 
> On 09/29/2015 10:34 PM, David Sterba wrote:
> > On Fri, Aug 14, 2015 at 06:32:45PM +0800, Anand Jain wrote:
> >> Anand Jain (22):
> >>Btrfs: rename btrfs_sysfs_add_one to btrfs_sysfs_add_mounted
> >>Btrfs: rename btrfs_sysfs_remove_one to btrfs_sysfs_remove_mounted
> >>Btrfs: rename btrfs_kobj_add_device to btrfs_sysfs_add_device_link
> >>Btrfs: rename btrfs_kobj_rm_device to btrfs_sysfs_rm_device_link
> >>Btrfs: rename super_kobj to fsid_kobj
> >>Btrfs: SB read failure should return EIO for __bread failure
> >>Btrfs: __btrfs_std_error() logic should be consistent w/out
> >>  CONFIG_PRINTK defined
> >
> > FYI, I'm picking the above for 4.4 as they're quite straightforward,
>  > the other patches touch interfaces and I have some comments.
> 
> Thanks David.
> Except for
>[PATCH 08/23] Btrfs: device delete by devid
>[PATCH 23/23] Btrfs: allow -o rw,degraded for single group profile

In case the two patches are independent on the rest of series, it would
be better to put them towards the end of the series. I was going down
the list and stpeed at 08/23 because it introduced something nontrivial
and then I can't be sure that skipping the single patch would not break
the whole series.

> rest are straightforward as well. If there is any comment will take it.

I'll have another look and will let you know.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/2] btrfs-progs: Introduce warning and error for common use

2015-09-30 Thread David Sterba
On Mon, Sep 28, 2015 at 09:58:13PM +0800, Zhao Lei wrote:
> And sometimes, we forgot add tailed '\n',

This is actually a good point and we don't need to put the trailing
newline to all the messages, similar to the btrfs_* macros used in
kernel. I'll update the patches.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/2] btrfs-progs: Introduce warning and error for common use

2015-09-30 Thread David Sterba
On Mon, Sep 28, 2015 at 09:58:12PM +0800, Zhao Lei wrote:
> This patch introduce warning() and error() as common function,
...
> Converting all source is a big work, this patch convert cmds-scrub.c
> We'll convert others these days, and new code can use these function
> directly.
>
> Zhao Lei (2):
>   btrfs-progs: Introduce warning and error for common use
>   btrfs-progs: use common warning/error for cmds-scrub.c

Both applied, thanks. With some changes like joining lines where
possible and shifting the messages left so they fit to ~80 chars.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Add stripes filter

2015-09-30 Thread David Sterba
On Tue, Sep 29, 2015 at 12:21:39PM +, Hugo Mills wrote:
> On Tue, Sep 29, 2015 at 08:10:19AM -0400, Austin S Hemmelgarn wrote:
> > On 2015-09-29 08:00, David Sterba wrote:
> > >On Mon, Sep 28, 2015 at 05:57:05PM +, Gabríel Arthúr Pétursson wrote:
> > >>The attached patches to linux and btrfs-progs add support for filtering
> > >>based on the number of strips in a block when balancing.
> > >
> > >What usecase do you want to address? As I understand it, this would help
> > >the raid56 rebalancing to process only blockgroups that are not spread
> > >accross enough devices.
> 
>Exactly. Last week, I was trying to help Gabríel on IRC with a
> close-to-full filesystem balance it to add some new devices in a
> parity RAID configuration. He'd added the devices and balanced, but
> the usage was unequal across the devices. The only way I could think
> of dealing with it with the current tools was either to do a full
> balance repeatedly until it worked itself out, or to delve into the
> metadata with btrfs-debug-tree, and balance selected block groups
> individually.
> 
>I whinged that we needed a filter to pick just the block groups
> that weren't "as full as possible", and Gabríel picked up the idea and
> ran with it.

That's great, thanks. The stripe filters are really missing.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: check pending chunks when shrinking fs to avoid corruption

2015-09-30 Thread Alex Lyakas
Hi Filipe,

Looking the code of this patch, I see that if we discover a pending
chunk, we unlock the chunk mutex, commit the transaction (which
completes the allocation of all pending chunks and inserts relevant
items into the device tree and chunk tree), and retry the search.

However, after we unlock the chunk mutex, somebody could have
attempted a new chunk allocation, which would have resulted in new
pending chunk. On the other hand, we have done:

btrfs_device_set_total_bytes(device, new_size);

so this line should prevent anybody to allocate beyond the new size.
In that case, we are sure that on the seconds pass there will be no
pending chunks beyond the new size, so we can shrink to new_size
safely. Is my understanding correct?

Thanks,
Alex.



On Tue, Jun 2, 2015 at 3:43 PM,   wrote:
> From: Filipe Manana 
>
> When we shrink the usable size of a device (its total_bytes), we go over
> all the device extent items in the device tree and attempt to relocate
> the chunk of any device extent that goes beyond the new usable size for
> the device. We do that after setting the new usable size (total_bytes) in
> the device object, so that all new allocations (and reallocations) don't
> use areas of the device that go beyond the new (shorter) size. However we
> were not considering that before setting the new size in the device,
> pending chunks might have been created that use device extents that go
> beyond the new size, and those device extents are not yet in the device
> tree after we search the device tree - they are still attached to the
> list of new block group for some ongoing transaction handle, and they are
> only added to the device tree when the transaction handle is ended (via
> btrfs_create_pending_block_groups()).
>
> So check for pending chunks with device extents that go beyond the new
> size and if any exists, commit the current transaction and repeat the
> search in the device tree.
>
> Not doing this it would mean we would return success to user space while
> still having extents that go beyond the new size, and later user space
> could override those locations on the device while the fs still references
> them, causing all sorts of corruption and unexpected events.
>
> Signed-off-by: Filipe Manana 
> ---
>  fs/btrfs/volumes.c | 49 -
>  1 file changed, 40 insertions(+), 9 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index dbea12e..09e89a6 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3984,6 +3984,7 @@ int btrfs_shrink_device(struct btrfs_device *device, 
> u64 new_size)
> int slot;
> int failed = 0;
> bool retried = false;
> +   bool checked_pending_chunks = false;
> struct extent_buffer *l;
> struct btrfs_key key;
> struct btrfs_super_block *super_copy = root->fs_info->super_copy;
> @@ -4064,15 +4065,6 @@ again:
> goto again;
> } else if (failed && retried) {
> ret = -ENOSPC;
> -   lock_chunks(root);
> -
> -   btrfs_device_set_total_bytes(device, old_size);
> -   if (device->writeable)
> -   device->fs_devices->total_rw_bytes += diff;
> -   spin_lock(>fs_info->free_chunk_lock);
> -   root->fs_info->free_chunk_space += diff;
> -   spin_unlock(>fs_info->free_chunk_lock);
> -   unlock_chunks(root);
> goto done;
> }
>
> @@ -4084,6 +4076,35 @@ again:
> }
>
> lock_chunks(root);
> +
> +   /*
> +* We checked in the above loop all device extents that were already 
> in
> +* the device tree. However before we have updated the device's
> +* total_bytes to the new size, we might have had chunk allocations 
> that
> +* have not complete yet (new block groups attached to transaction
> +* handles), and therefore their device extents were not yet in the
> +* device tree and we missed them in the loop above. So if we have any
> +* pending chunk using a device extent that overlaps the device range
> +* that we can not use anymore, commit the current transaction and
> +* repeat the search on the device tree - this way we guarantee we 
> will
> +* not have chunks using device extents that end beyond 'new_size'.
> +*/
> +   if (!checked_pending_chunks) {
> +   u64 start = new_size;
> +   u64 len = old_size - new_size;
> +
> +   if (contains_pending_extent(trans, device, , len)) {
> +   unlock_chunks(root);
> +   checked_pending_chunks = true;
> +   failed = 0;
> +   retried = false;
> +   ret = btrfs_commit_transaction(trans, root);
> +   if (ret)
> +  

Re: [PATCH v2 1/2] btrfs: Fix lost-data-profile caused by auto removing bg

2015-09-30 Thread Filipe Manana
On Wed, Sep 30, 2015 at 12:11 PM, Zhao Lei  wrote:
> Reproduce:
>  (In integration-4.3 branch)
>
>  TEST_DEV=(/dev/vdg /dev/vdh)
>  TEST_DIR=/mnt/tmp
>
>  umount "$TEST_DEV" >/dev/null
>  mkfs.btrfs -f -d raid1 "${TEST_DEV[@]}"
>
>  mount -o nospace_cache "$TEST_DEV" "$TEST_DIR"
>  umount "$TEST_DEV"
>
>  mount -o nospace_cache "$TEST_DEV" "$TEST_DIR"
>  btrfs filesystem usage $TEST_DIR
>
> We can see the data chunk changed from raid1 to single:
>  # btrfs filesystem usage $TEST_DIR
>  Data,single: Size:8.00MiB, Used:0.00B
> /dev/vdg8.00MiB
>  #
>
> Reason:
>  When a empty filesystem mount with -o nospace_cache, the last
>  data blockgroup will be auto-removed in umount.
>
>  Then if we mount it again, there is no data chunk in the
>  filesystem, so the only available data profile is 0x0, result
>  is all new chunks are created as single type.
>
> Fix:
>  Don't auto-delete last blockgroup for a raid type.
>
> Test:
>  Test by above script, and confirmed the logic by debug output.
>
> Changelog v1->v2:
> 1: Put code of checking block_group->list into
>semaphore of space_info->groups_sem.
> Noticed-by: Filipe Manana 
>
> Signed-off-by: Zhao Lei 

Reviewed-by: Filipe Manana 

I would have made the check in the "if" statement below that is
already done while holding a write lock on the semaphore (smaller code
diff), but this is equally correct.

thanks

> ---
>  fs/btrfs/extent-tree.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 79a5bd9..ed9426c 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -10010,8 +10010,18 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info 
> *fs_info)
> block_group = list_first_entry(_info->unused_bgs,
>struct btrfs_block_group_cache,
>bg_list);
> -   space_info = block_group->space_info;
> list_del_init(_group->bg_list);
> +
> +   space_info = block_group->space_info;
> +
> +   down_read(_info->groups_sem);
> +   if (block_group->list.next == block_group->list.prev) {
> +   up_read(_info->groups_sem);
> +   btrfs_put_block_group(block_group);
> +   continue;
> +   }
> +   up_read(_info->groups_sem);
> +
> if (ret || btrfs_mixed_space_info(space_info)) {
> btrfs_put_block_group(block_group);
> continue;
> --
> 1.8.5.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: add stripes filter

2015-09-30 Thread David Sterba
Hi,

thanks for the patch. The stripe filter is really helpful. There are
some minor comments below but otherwise the patch looks good.

On Mon, Sep 28, 2015 at 10:32:41PM +, Gabríel Arthúr Pétursson wrote:
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -849,7 +849,11 @@ struct btrfs_disk_balance_args {
>   /* BTRFS_BALANCE_ARGS_LIMIT value */
>   __le64 limit;
>  
> - __le64 unused[7];
> + /* btrfs stripes filter */
> + __le64 sstart;
> + __le64 send;

Please be more descriptive, eg. min_stripes/max_stripes. The u64 type
seems too much, I think we can fit the stripe count into a 32bit number.
I made a mistake with u64 type for the 'limit' filter but I think that
we can somehow extend it to be two u32 with the min/max meaning as well.
Either way, this is independent of your patch.

> +
> + __le64 unused[5];
>  } __attribute__ ((__packed__));
>  
>  /*
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3236,6 +3248,12 @@ static int should_balance_chunk(struct btrfs_root 
> *root,
>   return 0;
>   }
>  
> + /* stripes filter */
> + if ((bargs->flags & BTRFS_BALANCE_ARGS_STRIPES) &&
> + chunk_stripes_filter(leaf, chunk, bargs)) {
> + return 0;
> + }

Ok, I think that this ordering of the filters is right.

> +
>   /* soft profile changing mode */
>   if ((bargs->flags & BTRFS_BALANCE_ARGS_SOFT) &&
>   chunk_soft_convert_filter(chunk_type, bargs)) {
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 2ca784a..fb6b89a 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -375,6 +375,7 @@ struct map_lookup {
>  #define BTRFS_BALANCE_ARGS_DRANGE(1ULL << 3)
>  #define BTRFS_BALANCE_ARGS_VRANGE(1ULL << 4)
>  #define BTRFS_BALANCE_ARGS_LIMIT (1ULL << 5)
> +#define BTRFS_BALANCE_ARGS_STRIPES   (1ULL << 6)
>  
>  /*
>   * Profile changing flags.  When SOFT is set we won't relocate chunk if
> diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
> index b6dec05..a7819d0 100644
> --- a/include/uapi/linux/btrfs.h
> +++ b/include/uapi/linux/btrfs.h
> @@ -218,7 +218,11 @@ struct btrfs_balance_args {
>   __u64 flags;
>  
>   __u64 limit;/* limit number of processed chunks */
> - __u64 unused[7];

same comment from the ctree.h applies here

> +
> + __u64 sstart;
> + __u64 send;
> +
> + __u64 unused[5];
>  } __attribute__ ((__packed__));
>  
>  /* report balance progress to userspace */
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Add stripes filter

2015-09-30 Thread David Sterba
On Tue, Sep 29, 2015 at 08:10:19AM -0400, Austin S Hemmelgarn wrote:
> On 2015-09-29 08:00, David Sterba wrote:
> > On Mon, Sep 28, 2015 at 05:57:05PM +, Gabríel Arthúr Pétursson wrote:
> >> The attached patches to linux and btrfs-progs add support for filtering
> >> based on the number of strips in a block when balancing.
> >
> > What usecase do you want to address? As I understand it, this would help
> > the raid56 rebalancing to process only blockgroups that are not spread
> > accross enough devices.
> This could also be helpful when reshaping a raid10 or raid0 setup.

Right, I forgot to mention it, I should have said "any raid profile that
uses stripes".
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] btrfs-progs: device delete to accept devid

2015-09-30 Thread David Sterba
On Wed, Sep 30, 2015 at 06:03:42AM +0800, Anand Jain wrote:
> +struct btrfs_ioctl_vol_args_v3 {

Can we use struct btrfs_ioctl_vol_args_v2 for that purpose? It contains
the 'flags' so we can abuse the name field to store the device id and
set the flags accordingly.

> + __s64 fd;
> + char name[BTRFS_PATH_NAME_MAX + 1];
> + __u64 devid;
> +};
> +
>  #define BTRFS_DEVICE_PATH_NAME_MAX 1024
>  
>  #define BTRFS_SUBVOL_CREATE_ASYNC(1ULL << 0)
> @@ -683,6 +689,8 @@ static inline char *btrfs_err_str(enum btrfs_err_code 
> err_code)
>struct btrfs_ioctl_feature_flags[2])
>  #define BTRFS_IOC_GET_SUPPORTED_FEATURES _IOR(BTRFS_IOCTL_MAGIC, 57, \
>struct btrfs_ioctl_feature_flags[3])
> +#define BTRFS_IOC_RM_DEV_V2  _IOW(BTRFS_IOCTL_MAGIC, 58, \
> +struct btrfs_ioctl_vol_args_v3)

And we can reuse the ioctl nmuber 11

#define BTRFS_IOC_RM_DEV_V2 _IOW(BTRFS_IOCTL_MAGIC, 11, \
   struct btrfs_ioctl_vol_args_v2)

The vol_v2 structure is extensible so we can add more functionality
there and then I think it justifies the V2 interface bump.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/11] using of calloc instead of malloc+memset

2015-09-30 Thread David Sterba
On Tue, Sep 29, 2015 at 07:10:35PM +0200, Silvio Fricke wrote:
> Silvio Fricke (11):
>   btrfs-progs: use calloc instead of malloc+memset for btrfs-image.c
>   btrfs-progs: use calloc instead of malloc+memset for btrfs-list.c
>   btrfs-progs: use calloc instead of malloc+memset for chunk-recover.c
>   btrfs-progs: use calloc instead of malloc+memset for cmds-check.c
>   btrfs-progs: use calloc instead of malloc+memset for disk-io.c
>   btrfs-progs: use calloc instead of malloc+memset for extent_io.c
>   btrfs-progs: use calloc instead of malloc+memset for mkfs.c
>   btrfs-progs: use calloc instead of malloc+memset for qgroup.c
>   btrfs-progs: use calloc instead of malloc+memset for quick-test.c
>   btrfs-progs: use calloc instead of malloc+memset for volumes.c

Thanks.

I think that all of these can be squeezed into one, the change
is logically same in all the files and easy to review even in a big
patch. So I'll do that, the lat patch will be separate as it's a fix.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html