[PATCH] btrfs: drop unused space_info parameter from create_space_info

2018-05-27 Thread Lu Fengqi
Since commit dc2d3005d27d ("btrfs: remove dead create_space_info
calls"), there is only one caller btrfs_init_space_info. However, it
doesn't need create_space_info to return space_info at all.

Signed-off-by: Lu Fengqi 
---
 fs/btrfs/extent-tree.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 2c65a09535b6..31627de751d1 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4007,8 +4007,7 @@ static const char *alloc_name(u64 flags)
};
 }
 
-static int create_space_info(struct btrfs_fs_info *info, u64 flags,
-struct btrfs_space_info **new)
+static int create_space_info(struct btrfs_fs_info *info, u64 flags)
 {
 
struct btrfs_space_info *space_info;
@@ -4046,7 +4045,6 @@ static int create_space_info(struct btrfs_fs_info *info, 
u64 flags,
return ret;
}
 
-   *new = space_info;
list_add_rcu(&space_info->list, &info->space_info);
if (flags & BTRFS_BLOCK_GROUP_DATA)
info->data_sinfo = space_info;
@@ -10810,7 +10808,6 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info 
*fs_info)
 
 int btrfs_init_space_info(struct btrfs_fs_info *fs_info)
 {
-   struct btrfs_space_info *space_info;
struct btrfs_super_block *disk_super;
u64 features;
u64 flags;
@@ -10826,21 +10823,21 @@ int btrfs_init_space_info(struct btrfs_fs_info 
*fs_info)
mixed = 1;
 
flags = BTRFS_BLOCK_GROUP_SYSTEM;
-   ret = create_space_info(fs_info, flags, &space_info);
+   ret = create_space_info(fs_info, flags);
if (ret)
goto out;
 
if (mixed) {
flags = BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA;
-   ret = create_space_info(fs_info, flags, &space_info);
+   ret = create_space_info(fs_info, flags);
} else {
flags = BTRFS_BLOCK_GROUP_METADATA;
-   ret = create_space_info(fs_info, flags, &space_info);
+   ret = create_space_info(fs_info, flags);
if (ret)
goto out;
 
flags = BTRFS_BLOCK_GROUP_DATA;
-   ret = create_space_info(fs_info, flags, &space_info);
+   ret = create_space_info(fs_info, flags);
}
 out:
return ret;
-- 
2.17.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] btrfs: balance dirty metadata pages in btrfs_finish_ordered_io

2018-05-27 Thread Ethan Lien
[Problem description and how we fix it]
We should balance dirty metadata pages at the end of
btrfs_finish_ordered_io, since a small, unmergeable random write can
potentially produce dirty metadata which is multiple times larger than
the data itself. For example, a small, unmergeable 4KiB write may
produce:

16KiB dirty leaf (and possibly 16KiB dirty node) in subvolume tree
16KiB dirty leaf (and possibly 16KiB dirty node) in checksum tree
16KiB dirty leaf (and possibly 16KiB dirty node) in extent tree

Although we do call balance dirty pages in write side, but in the
buffered write path, most metadata are dirtied only after we reach the
dirty background limit (which by far only counts dirty data pages) and
wakeup the flusher thread. If there are many small, unmergeable random
writes spread in a large btree, we'll find a burst of dirty pages
exceeds the dirty_bytes limit after we wakeup the flusher thread - which
is not what we expect. In our machine, it caused out-of-memory problem
since a page cannot be dropped if it is marked dirty.

Someone may worry about we may sleep in btrfs_btree_balance_dirty_nodelay,
but since we do btrfs_finish_ordered_io in a separate worker, it will not
stop the flusher consuming dirty pages. Also, we use different worker for
metadata writeback endio, sleep in btrfs_finish_ordered_io help us throttle
the size of dirty metadata pages.

[Reproduce steps]
To reproduce the problem, we need to do 4KiB write randomly spread in a
large btree. In our 2GiB RAM machine:
1) Create 4 subvolumes.
2) Run fio on each subvolume:

   [global]
   direct=0
   rw=randwrite
   ioengine=libaio
   bs=4k
   iodepth=16
   numjobs=1
   group_reporting
   size=128G
   runtime=1800
   norandommap
   time_based
   randrepeat=0

3) Take snapshot on each subvolume and repeat fio on existing files.
4) Repeat step (3) until we get large btrees.
   In our case, by observing btrfs_root_item->bytes_used, we have 2GiB of
   metadata in each subvolume tree and 12GiB of metadata in extent tree.
5) Stop all fio, take snapshot again, and wait until all delayed work is
   completed.
6) Start all fio. Few seconds later we hit OOM when the flusher starts
   to work.

It can be reproduced even when using nocow write.

Signed-off-by: Ethan Lien 
---

V2:
Replace btrfs_btree_balance_dirty with 
btrfs_btree_balance_dirty_nodelay.
Add reproduce steps.

 fs/btrfs/inode.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8e604e7071f1..e54547df24ee 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3158,6 +3158,8 @@ static int btrfs_finish_ordered_io(struct 
btrfs_ordered_extent *ordered_extent)
/* once for the tree */
btrfs_put_ordered_extent(ordered_extent);
 
+   btrfs_btree_balance_dirty_nodelay(fs_info);
+
return ret;
 }
 
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: off-by-one uncompressed invalid ram_bytes corruptions

2018-05-27 Thread Qu Wenruo


On 2018年05月28日 11:47, Steve Leung wrote:
> On 05/26/2018 06:57 PM, Qu Wenruo wrote:
>>
>>
>> On 2018年05月26日 22:06, Steve Leung wrote:
>>> On 05/20/2018 07:07 PM, Qu Wenruo wrote:


 On 2018年05月21日 04:43, Steve Leung wrote:
> On 05/19/2018 07:02 PM, Qu Wenruo wrote:
>>
>>
>> On 2018年05月20日 07:40, Steve Leung wrote:
>>> On 05/17/2018 11:49 PM, Qu Wenruo wrote:
 On 2018年05月18日 13:23, Steve Leung wrote:
> Hi list,
>
> I've got 3-device raid1 btrfs filesystem that's throwing up some
> "corrupt leaf" errors in dmesg.  This is a uniquified list I've
> observed lately:
>
>   BTRFS critical (device sda1): corrupt leaf: root=1
> block=4970196795392
> slot=307 ino=206231 file_offset=0, invalid ram_bytes for
> uncompressed
> inline extent, have 3468 expect 3469

 Would you please use "btrfs-debug-tree -b 4970196795392
 /dev/sda1" to
 dump the leaf?
>>>
>>> Attached btrfs-debug-tree dumps for all of the blocks that I saw
>>> messages for.
>>>
 It's caught by tree-checker code which is ensuring all tree blocks
 are
 correct before btrfs can take use of them.

 That inline extent size check is tested, so I'm wondering if this
 indicates any real corruption.
 That btrfs-debug-tree output will definitely help.

 BTW, if I didn't miss anything, there should not be any inlined
 extent
 in root tree.

>   BTRFS critical (device sda1): corrupt leaf: root=1
> block=4970552426496
> slot=91 ino=209736 file_offset=0, invalid ram_bytes for
> uncompressed
> inline extent, have 3496 expect 3497

 Same dump will definitely help.

>   BTRFS critical (device sda1): corrupt leaf: root=1
> block=4970712399872
> slot=221 ino=205230 file_offset=0, invalid ram_bytes for
> uncompressed
> inline extent, have 1790 expect 1791
>   BTRFS critical (device sda1): corrupt leaf: root=1
> block=4970803920896
> slot=368 ino=205732 file_offset=0, invalid ram_bytes for
> uncompressed
> inline extent, have 2475 expect 2476
>   BTRFS critical (device sda1): corrupt leaf: root=1
> block=4970987945984
> slot=236 ino=208896 file_offset=0, invalid ram_bytes for
> uncompressed
> inline extent, have 490 expect 491
>
> All of them seem to be 1 short of the expected value.
>
> Some files do seem to be inaccessible on the filesystem, and btrfs
> inspect-internal on any of those inode numbers fails with:
>
>  ERROR: ino paths ioctl: Input/output error
>
> and another message for that inode appears.
>
> 'btrfs check' (output attached) seems to notice these corruptions
> (among
> a few others, some of which seem to be related to a problematic
> attempt
> to build Android I posted about some months ago).
>
> Other information:
>
> Arch Linux x86-64, kernel 4.16.6, btrfs-progs 4.16.  The
> filesystem
> has
> about 25 snapshots at the moment, only a handful of compressed
> files,
> and nothing fancy like qgroups enabled.
>
> btrfs fi show:
>
>  Label: none  uuid: 9d4db9e3-b9c3-4f6d-8cb4-60ff55e96d82
>  Total devices 4 FS bytes used 2.48TiB
>  devid    1 size 1.36TiB used 1.13TiB path /dev/sdd1
>  devid    2 size 464.73GiB used 230.00GiB path
> /dev/sdc1
>  devid    3 size 1.36TiB used 1.13TiB path /dev/sdb1
>  devid    4 size 3.49TiB used 2.49TiB path /dev/sda1
>
> btrfs fi df:
>
>  Data, RAID1: total=2.49TiB, used=2.48TiB
>  System, RAID1: total=32.00MiB, used=416.00KiB
>  Metadata, RAID1: total=7.00GiB, used=5.29GiB
>  GlobalReserve, single: total=512.00MiB, used=0.00B
>
> dmesg output attached as well.
>
> Thanks in advance for any assistance!  I have backups of all the
> important stuff here but it would be nice to fix the
> corruptions in
> place.

 And btrfs check doesn't report the same problem as the default
 original
 mode doesn't have such check.

 Please also post the result of "btrfs check --mode=lowmem
 /dev/sda1"
>>>
>>> Also, attached.  It seems to notice the same off-by-one problems,
>>> though
>>> there also seem to be a couple of examples of being off by more than
>>> one.
>>
>> Unfortunately, it doesn't detect, as there is no off-by-one error at

Re: off-by-one uncompressed invalid ram_bytes corruptions

2018-05-27 Thread Steve Leung

On 05/26/2018 06:57 PM, Qu Wenruo wrote:



On 2018年05月26日 22:06, Steve Leung wrote:

On 05/20/2018 07:07 PM, Qu Wenruo wrote:



On 2018年05月21日 04:43, Steve Leung wrote:

On 05/19/2018 07:02 PM, Qu Wenruo wrote:



On 2018年05月20日 07:40, Steve Leung wrote:

On 05/17/2018 11:49 PM, Qu Wenruo wrote:

On 2018年05月18日 13:23, Steve Leung wrote:

Hi list,

I've got 3-device raid1 btrfs filesystem that's throwing up some
"corrupt leaf" errors in dmesg.  This is a uniquified list I've
observed lately:



  BTRFS critical (device sda1): corrupt leaf: root=1
block=4970196795392
slot=307 ino=206231 file_offset=0, invalid ram_bytes for
uncompressed
inline extent, have 3468 expect 3469


Would you please use "btrfs-debug-tree -b 4970196795392 /dev/sda1" to
dump the leaf?


Attached btrfs-debug-tree dumps for all of the blocks that I saw
messages for.


It's caught by tree-checker code which is ensuring all tree blocks
are
correct before btrfs can take use of them.

That inline extent size check is tested, so I'm wondering if this
indicates any real corruption.
That btrfs-debug-tree output will definitely help.

BTW, if I didn't miss anything, there should not be any inlined
extent
in root tree.


  BTRFS critical (device sda1): corrupt leaf: root=1
block=4970552426496
slot=91 ino=209736 file_offset=0, invalid ram_bytes for uncompressed
inline extent, have 3496 expect 3497


Same dump will definitely help.


  BTRFS critical (device sda1): corrupt leaf: root=1
block=4970712399872
slot=221 ino=205230 file_offset=0, invalid ram_bytes for
uncompressed
inline extent, have 1790 expect 1791
  BTRFS critical (device sda1): corrupt leaf: root=1
block=4970803920896
slot=368 ino=205732 file_offset=0, invalid ram_bytes for
uncompressed
inline extent, have 2475 expect 2476
  BTRFS critical (device sda1): corrupt leaf: root=1
block=4970987945984
slot=236 ino=208896 file_offset=0, invalid ram_bytes for
uncompressed
inline extent, have 490 expect 491

All of them seem to be 1 short of the expected value.

Some files do seem to be inaccessible on the filesystem, and btrfs
inspect-internal on any of those inode numbers fails with:

 ERROR: ino paths ioctl: Input/output error

and another message for that inode appears.

'btrfs check' (output attached) seems to notice these corruptions
(among
a few others, some of which seem to be related to a problematic
attempt
to build Android I posted about some months ago).

Other information:

Arch Linux x86-64, kernel 4.16.6, btrfs-progs 4.16.  The filesystem
has
about 25 snapshots at the moment, only a handful of compressed
files,
and nothing fancy like qgroups enabled.

btrfs fi show:

 Label: none  uuid: 9d4db9e3-b9c3-4f6d-8cb4-60ff55e96d82
     Total devices 4 FS bytes used 2.48TiB
     devid    1 size 1.36TiB used 1.13TiB path /dev/sdd1
     devid    2 size 464.73GiB used 230.00GiB path /dev/sdc1
     devid    3 size 1.36TiB used 1.13TiB path /dev/sdb1
     devid    4 size 3.49TiB used 2.49TiB path /dev/sda1

btrfs fi df:

 Data, RAID1: total=2.49TiB, used=2.48TiB
 System, RAID1: total=32.00MiB, used=416.00KiB
 Metadata, RAID1: total=7.00GiB, used=5.29GiB
 GlobalReserve, single: total=512.00MiB, used=0.00B

dmesg output attached as well.

Thanks in advance for any assistance!  I have backups of all the
important stuff here but it would be nice to fix the corruptions in
place.


And btrfs check doesn't report the same problem as the default
original
mode doesn't have such check.

Please also post the result of "btrfs check --mode=lowmem /dev/sda1"


Also, attached.  It seems to notice the same off-by-one problems,
though
there also seem to be a couple of examples of being off by more than
one.


Unfortunately, it doesn't detect, as there is no off-by-one error at
all.

The problem is, kernel is reporting error on completely fine leaf.

Further more, even in the same leaf, there are more inlined extents,
and
they are all valid.

So the kernel reports the error out of nowhere.

More problems happens for extent_size where a lot of them is offset by
one.

Moreover, the root owner is not printed correctly, thus I'm
wondering if
the memory is corrupted.

Please try memtest+ to verify all your memory is correct, and if so,
please try the attached patch and to see if it provides extra info.


Memtest ran for about 12 hours last night, and didn't find any errors.

New messages from patched kernel:

   BTRFS critical (device sdd1): corrupt leaf: root=1 block=4970196795392
slot=307 ino=206231 file_offset=0, invalid ram_bytes for uncompressed
inline extent, have 3468 expect 3469 (21 + 3448)


This output doesn't match with debug-tree dump.

item 307 key (206231 EXTENT_DATA 0) itemoff 15118 itemsize 3468
 generation 692987 type 0 (inline)
 inline extent data size 3447 ram_bytes 3447 compression 0 (none)

Where its ram_bytes is 3447, not 3448.

Further more, there are 2 more inlined extent, if something really went
w

[PATCH v2 6/6] btrfs-progs: add chattr support for send/receive

2018-05-27 Thread Howard McLauchlan
From: Howard McLauchlan 

Presently, btrfs send/receive does not propagate inode attribute flags;
all chattr operations are effectively discarded upon transmission.

This patch adds userspace support for inode attribute flags. Kernel
support can be found under the commit:

btrfs: add chattr support for send/receive

An associated xfstest can also be found at:

btrfs: verify chattr support for send/receive test

Signed-off-by: Howard McLauchlan 
---
 cmds-receive.c | 37 +
 send-dump.c|  6 ++
 send-stream.c  |  5 +
 send-stream.h  |  1 +
 4 files changed, 49 insertions(+)

diff --git a/cmds-receive.c b/cmds-receive.c
index 20e593f7..2a841bfc 100644
--- a/cmds-receive.c
+++ b/cmds-receive.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ctree.h"
 #include "ioctl.h"
@@ -1201,6 +1202,41 @@ out:
return ret;
 }
 
+static int process_chattr(const char *path, u64 flags, void *user)
+{
+   int ret = 0;
+   int fd = 0;
+   int _flags = flags;
+   struct btrfs_receive *rctx = user;
+   char full_path[PATH_MAX];
+
+   ret = path_cat_out(full_path, rctx->full_subvol_path, path);
+   if (ret < 0) {
+   error("chattr: path invalid: %s", path);
+   goto out;
+   }
+
+   if (g_verbose >= 2)
+   fprintf(stderr, "chattr %s - flags=0%o\n", path, (int)flags);
+
+   fd = open(full_path, O_RDONLY);
+   if (fd < 0) {
+   ret = -errno;
+   error("cannot open %s: %s", path, strerror(-ret));
+   goto out;
+   }
+
+   ret = ioctl(fd, FS_IOC_SETFLAGS, &_flags);
+   if (ret < 0) {
+   ret = -errno;
+   error("chattr %s failed: %s", path, strerror(-ret));
+   goto out;
+   }
+
+out:
+   return ret;
+}
+
 static struct btrfs_send_ops send_ops = {
.subvol = process_subvol,
.snapshot = process_snapshot,
@@ -1225,6 +1261,7 @@ static struct btrfs_send_ops send_ops = {
.update_extent = process_update_extent,
.total_data_size = process_total_data_size,
.fallocate = process_fallocate,
+   .chattr = process_chattr,
 };
 
 static int do_receive(struct btrfs_receive *rctx, const char *tomnt,
diff --git a/send-dump.c b/send-dump.c
index c5a695a2..15aea402 100644
--- a/send-dump.c
+++ b/send-dump.c
@@ -331,6 +331,11 @@ static int print_fallocate(const char *path, u32 flags, 
u64 offset, u64 len,
  len);
 }
 
+static int print_chattr(const char *path, u64 flags, void *user)
+{
+   return PRINT_DUMP(user, path, "chattr", "flags=%llu", flags);
+}
+
 struct btrfs_send_ops btrfs_print_send_ops = {
.subvol = print_subvol,
.snapshot = print_snapshot,
@@ -355,4 +360,5 @@ struct btrfs_send_ops btrfs_print_send_ops = {
.update_extent = print_update_extent,
.total_data_size = print_total_data_size,
.fallocate = print_fallocate,
+   .chattr = print_chattr,
 };
diff --git a/send-stream.c b/send-stream.c
index 74ec37dd..4f26fae3 100644
--- a/send-stream.c
+++ b/send-stream.c
@@ -470,6 +470,11 @@ static int read_and_process_cmd(struct btrfs_send_stream 
*sctx)
sctx->user);
}
break;
+   case BTRFS_SEND_C_CHATTR:
+   TLV_GET_STRING(sctx, BTRFS_SEND_A_PATH, &path);
+   TLV_GET_U64(sctx, BTRFS_SEND_A_CHATTR, &tmp);
+   ret = sctx->ops->chattr(path, tmp, sctx->user);
+   break;
case BTRFS_SEND_C_END:
ret = 1;
break;
diff --git a/send-stream.h b/send-stream.h
index 89e64043..a9f08d52 100644
--- a/send-stream.h
+++ b/send-stream.h
@@ -69,6 +69,7 @@ struct btrfs_send_ops {
int (*total_data_size)(u64 size, void *user);
int (*fallocate)(const char *path, u32 flags, u64 offset,
 u64 len, void *user);
+   int (*chattr)(const char *path, u64 flags, void *user);
 };
 
 int btrfs_read_and_process_send_stream(int fd,
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/6] Btrfs-progs: send, bump stream version

2018-05-27 Thread Howard McLauchlan
From: Filipe Manana 

This increases the send stream version from version 1 to version 2, adding
new commands:

1) total data size - used to tell the receiver how much file data the stream
   will add or update;

2) fallocate - used to pre-allocate space for files and to punch holes in files;

3) inode set flags (chattr);

4) set inode otime;

5) sending compressed writes

This is preparation work for subsequent changes that implement the new features.

This doesn't break compatibility with older kernels or clients. In order to get
a version 2 send stream, new flags must be passed to the send ioctl.

Signed-off-by: Filipe David Borba Manana 
---
 cmds-send.c   | 61 ---
 ioctl.h   | 15 +
 send-stream.c |  2 +-
 send.h| 24 +++-
 4 files changed, 82 insertions(+), 20 deletions(-)

diff --git a/cmds-send.c b/cmds-send.c
index c5ecdaa1..0ec557c7 100644
--- a/cmds-send.c
+++ b/cmds-send.c
@@ -52,6 +52,7 @@
  * the 'At subvol' message.
  */
 static int g_verbose = 1;
+static int g_stream_version = BTRFS_SEND_STREAM_VERSION_1;
 
 struct btrfs_send {
int send_fd;
@@ -343,6 +344,8 @@ static int do_send(struct btrfs_send *send, u64 
parent_root_id,
io_send.flags |= BTRFS_SEND_FLAG_OMIT_STREAM_HEADER;
if (!is_last_subvol)
io_send.flags |= BTRFS_SEND_FLAG_OMIT_END_CMD;
+   if (g_stream_version == BTRFS_SEND_STREAM_VERSION_2)
+   io_send.flags |= BTRFS_SEND_FLAG_STREAM_V2;
ret = ioctl(subvol_fd, BTRFS_IOC_SEND, &io_send);
if (ret < 0) {
ret = -errno;
@@ -513,7 +516,8 @@ int cmd_send(int argc, char **argv)
static const struct option long_options[] = {
{ "verbose", no_argument, NULL, 'v' },
{ "quiet", no_argument, NULL, 'q' },
-   { "no-data", no_argument, NULL, GETOPT_VAL_SEND_NO_DATA 
}
+   { "no-data", no_argument, NULL, GETOPT_VAL_SEND_NO_DATA 
},
+   { "stream-version", 1, NULL, 'V' },
};
int c = getopt_long(argc, argv, "vqec:f:i:p:", long_options, 
NULL);
 
@@ -597,6 +601,24 @@ int cmd_send(int argc, char **argv)
error("option -i was removed, use -c instead");
ret = 1;
goto out;
+   case 'V':
+   if (sscanf(optarg, "%d", &g_stream_version) != 1) {
+   fprintf(stderr,
+   "ERROR: invalid value for stream 
version: %s\n",
+   optarg);
+   ret = 1;
+   goto out;
+   }
+   if (g_stream_version <= 0 ||
+   g_stream_version > BTRFS_SEND_STREAM_VERSION_MAX) {
+   fprintf(stderr,
+   "ERROR: unsupported stream version %d, 
minimum: 1, maximum: %d\n",
+   g_stream_version,
+   BTRFS_SEND_STREAM_VERSION_MAX);
+   ret = 1;
+   goto out;
+   }
+   break;
case GETOPT_VAL_SEND_NO_DATA:
send_flags |= BTRFS_SEND_FLAG_NO_FILE_DATA;
break;
@@ -776,7 +798,7 @@ out:
 }
 
 const char * const cmd_send_usage[] = {
-   "btrfs send [-ve] [-p ] [-c ] [-f ] 
 [...]",
+   "btrfs send [-ve] [--stream-version ] [-p ] [-c 
] [-f ]  [...]",
"Send the subvolume(s) to stdout.",
"Sends the subvolume(s) specified by  to stdout.",
" should be read-only here.",
@@ -790,21 +812,24 @@ const char * const cmd_send_usage[] = {
"which case 'btrfs send' will determine a suitable parent among the",
"clone sources itself.",
"\n",
-   "-e   If sending multiple subvols at once, use the new",
-   " format and omit the end-cmd between the subvols.",
-   "-p   Send an incremental stream from  to",
-   " .",
-   "-cUse this snapshot as a clone source for an ",
-   " incremental send (multiple allowed)",
-   "-f  Output is normally written to stdout. To write to",
-   " a file, use this option. An alternative would be to",
-   " use pipes.",
-   "--no-datasend in NO_FILE_DATA mode, Note: the output stream",
-   " does not contain any file data and thus cannot be 
used",
-   " to transfer changes. This mode is faster and useful 
to",
-   " show the differences in metadata.",
-   "-v|--verbose enable verbose output to stderr, each occurrence of",
-   "   

[PATCH v2 4/6] Btrfs-progs: add write and clone commands debug info to receive

2018-05-27 Thread Howard McLauchlan
From: Filipe Manana 

When specifying -vv print information about received write and clone commands 
too,
as we do this for other commands already and it's very useful for debugging and
troubleshooting.

Signed-off-by: Filipe David Borba Manana 
---
 cmds-receive.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/cmds-receive.c b/cmds-receive.c
index 510b6bc8..20e593f7 100644
--- a/cmds-receive.c
+++ b/cmds-receive.c
@@ -790,6 +790,10 @@ static int process_write(const char *path, const void 
*data, u64 offset,
u64 pos = 0;
int w;
 
+   if (g_verbose >= 2)
+   fprintf(stderr, "write %s, offset %llu, len %llu\n",
+   path, offset, len);
+
ret = path_cat_out(full_path, rctx->full_subvol_path, path);
if (ret < 0) {
error("write: path invalid: %s", path);
@@ -831,6 +835,11 @@ static int process_clone(const char *path, u64 offset, u64 
len,
char full_clone_path[PATH_MAX];
int clone_fd = -1;
 
+   if (g_verbose >= 2)
+   fprintf(stderr,
+   "clone %s, offset %llu, len %llu, clone path %s, clone 
offset %llu\n",
+   path, offset, len, clone_path, clone_offset);
+
ret = path_cat_out(full_path, rctx->full_subvol_path, path);
if (ret < 0) {
error("clone: source path invalid: %s", path);
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 3/6] btrfs: send, use fallocate command to punch holes

2018-05-27 Thread Howard McLauchlan
From: Filipe David Borba Manana 

Instead of sending a write command with a data buffer filled with 0 value bytes,
use the fallocate command, introduced in the send stream version 2, to tell the
receiver to punch a file hole using the fallocate system call.

[Howard: rebased on 4.17-rc7]
Signed-off-by: Howard McLauchlan 
Signed-off-by: Filipe David Borba Manana 
---
 fs/btrfs/send.c | 54 ++---
 fs/btrfs/send.h |  4 
 2 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 7b184831812b..328c7a2857ae 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -581,6 +581,7 @@ static int tlv_put(struct send_ctx *sctx, u16 attr, const 
void *data, int len)
return tlv_put(sctx, attr, &__tmp, sizeof(__tmp));  \
}
 
+TLV_PUT_DEFINE_INT(32)
 TLV_PUT_DEFINE_INT(64)
 
 static int tlv_put_string(struct send_ctx *sctx, u16 attr,
@@ -5047,17 +5048,57 @@ static int send_update_extent(struct send_ctx *sctx,
return ret;
 }
 
+static int send_fallocate(struct send_ctx *sctx, u32 flags,
+ u64 offset, u64 len)
+{
+   struct fs_path *p = NULL;
+   int ret = 0;
+
+   ASSERT(sctx->flags & BTRFS_SEND_FLAG_STREAM_V2);
+
+   if (sctx->phase == SEND_PHASE_COMPUTE_DATA_SIZE) {
+   sctx->total_data_size += len;
+   return 0;
+   }
+
+   p = fs_path_alloc();
+   if (!p)
+   return -ENOMEM;
+   ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, p);
+   if (ret < 0)
+   goto out;
+
+   ret = begin_cmd(sctx, BTRFS_SEND_C_FALLOCATE);
+   if (ret < 0)
+   goto out;
+   TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p);
+   TLV_PUT_U32(sctx, BTRFS_SEND_A_FALLOCATE_FLAGS, flags);
+   TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset);
+   TLV_PUT_U64(sctx, BTRFS_SEND_A_SIZE, len);
+   ret = send_cmd(sctx);
+
+tlv_put_failure:
+out:
+   fs_path_free(p);
+   return ret;
+}
+
+
 static int send_hole(struct send_ctx *sctx, u64 end)
 {
struct fs_path *p = NULL;
u64 offset = sctx->cur_inode_last_extent;
-   u64 len;
+   u64 len = end - offset;
int ret = 0;
 
if (sctx->phase == SEND_PHASE_COMPUTE_DATA_SIZE) {
-   sctx->total_data_size += end - offset;
+   sctx->total_data_size += len;
return 0;
}
+   if (sctx->flags & BTRFS_SEND_FLAG_STREAM_V2)
+   return send_fallocate(sctx, BTRFS_SEND_PUNCH_HOLE_FALLOC_FLAGS,
+ offset, len);
+
if (sctx->flags & BTRFS_SEND_FLAG_NO_FILE_DATA)
return send_update_extent(sctx, offset, end - offset);
 
@@ -5304,7 +5345,8 @@ static int send_write_or_clone(struct send_ctx *sctx,
ret = 0;
goto out;
}
-   if (offset + len > sctx->cur_inode_size)
+   if (offset < sctx->cur_inode_size &&
+   offset + len > sctx->cur_inode_size)
len = sctx->cur_inode_size - offset;
if (len == 0) {
ret = 0;
@@ -5325,6 +5367,12 @@ static int send_write_or_clone(struct send_ctx *sctx,
data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
ret = clone_range(sctx, clone_root, disk_byte, data_offset,
  offset, len);
+   } else if (btrfs_file_extent_disk_bytenr(path->nodes[0], ei) == 0 &&
+type != BTRFS_FILE_EXTENT_INLINE &&
+(sctx->flags & BTRFS_SEND_FLAG_STREAM_V2) &&
+offset < sctx->cur_inode_size) {
+   ret = send_fallocate(sctx, BTRFS_SEND_PUNCH_HOLE_FALLOC_FLAGS,
+offset, len);
} else {
ret = send_extent_data(sctx, offset, len);
}
diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h
index 152180304078..b6e9281f171a 100644
--- a/fs/btrfs/send.h
+++ b/fs/btrfs/send.h
@@ -139,6 +139,10 @@ enum {
 #define BTRFS_SEND_A_FALLOCATE_FLAG_KEEP_SIZE   (1 << 0)
 #define BTRFS_SEND_A_FALLOCATE_FLAG_PUNCH_HOLE  (1 << 1)
 
+#define BTRFS_SEND_PUNCH_HOLE_FALLOC_FLAGS\
+   (BTRFS_SEND_A_FALLOCATE_FLAG_KEEP_SIZE |  \
+BTRFS_SEND_A_FALLOCATE_FLAG_PUNCH_HOLE)
+
 #ifdef __KERNEL__
 long btrfs_ioctl_send(struct file *mnt_file, struct btrfs_ioctl_send_args 
*arg);
 #endif
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 4/6] btrfs: send, use fallocate command to allocate extents

2018-05-27 Thread Howard McLauchlan
From: Filipe David Borba Manana 

The send stream version 2 adds the fallocate command, which can be used to
allocate extents for a file or punch holes in a file. Previously we were
ignoring file prealloc extents or treating them as extents filled with 0
bytes and sending a regular write command to the stream.

After this change, together with my previous change titled:

"Btrfs: send, use fallocate command to punch holes"

an incremental send preserves the hole and data structure of files, which can
be seen via calls to lseek with the whence parameter set to SEEK_DATA or 
SEEK_HOLE,
as the example below shows:

mkfs.btrfs -f /dev/sdc
mount /dev/sdc /mnt
xfs_io -f -c "pwrite -S 0x01 -b 30 0 30" /mnt/foo
btrfs subvolume snapshot -r /mnt /mnt/mysnap1

xfs_io -c "fpunch 10 5" /mnt/foo
xfs_io -c "falloc 10 5" /mnt/foo
xfs_io -c "pwrite -S 0xff -b 1000 12 1000" /mnt/foo
xfs_io -c "fpunch 25 2" /mnt/foo

# prealloc extents that start beyond the inode's size
xfs_io -c "falloc -k 30 100" /mnt/foo
xfs_io -c "falloc -k 900 200" /mnt/foo

btrfs subvolume snapshot -r /mnt /mnt/mysnap2

btrfs send /mnt/mysnap1 -f /tmp/1.snap
btrfs send -p /mnt/mysnap1 /mnt/mysnap2 -f /tmp/2.snap

mkfs.btrfs -f /dev/sdd
mount /dev/sdd /mnt2
btrfs receive /mnt2 -f /tmp/1.snap
btrfs receive /mnt2 -f /tmp/2.snap

Before this change the hole/data structure differed between both filesystems:

$ xfs_io -r -c 'seek -r -a 0' /mnt/mysnap2/foo
Whence  Result
DATA0
HOLE102400
DATA118784
HOLE122880
DATA147456
HOLE253952
DATA266240
HOLE30

$ xfs_io -r -c 'seek -r -a 0' /mnt2/mysnap2/foo
Whence  Result
DATA0
HOLE30

After this change the second filesystem (/dev/sdd) ends up with the same 
hole/data
structure as the first filesystem.

Also, after this change, prealloc extents that lie beyond the inode's size (were
allocated with fallocate + keep size flag) are also replicated by an incremental
send. For the above test, it can be observed via fiemap (or btrfs-debug-tree):

$ xfs_io -r -c 'fiemap -l' /mnt2/mysnap2/foo
0: [0..191]: 25096..25287 192 blocks
1: [192..199]: 24672..24679 8 blocks
2: [200..231]: 24584..24615 32 blocks
3: [232..239]: 24680..24687 8 blocks
4: [240..287]: 24616..24663 48 blocks
5: [288..295]: 24688..24695 8 blocks
6: [296..487]: 25392..25583 192 blocks
7: [488..495]: 24696..24703 8 blocks
8: [496..519]: hole 24 blocks
9: [520..527]: 24704..24711 8 blocks
10: [528..583]: 25624..25679 56 blocks
11: [584..591]: 24712..24719 8 blocks
12: [592..2543]: 26192..28143 1952 blocks
13: [2544..17575]: hole 15032 blocks
14: [17576..21487]: 28144..32055 3912 blocks

The proposed xfstest can be found at:

xfstests: btrfs, test send's ability to punch holes and prealloc extents

This test verifies that send-stream version 2 does space pre-allocation
and hole punching.

[Howard: rebased on 4.17-rc7]
Signed-off-by: Howard McLauchlan 
Signed-off-by: Filipe David Borba Manana 
---
 fs/btrfs/send.c | 70 -
 1 file changed, 52 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 328c7a2857ae..84dacb20d832 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -98,9 +98,10 @@ struct send_ctx {
 */
u64 cur_ino;
u64 cur_inode_gen;
-   int cur_inode_new;
-   int cur_inode_new_gen;
-   int cur_inode_deleted;
+   u8 cur_inode_new:1;
+   u8 cur_inode_new_gen:1;
+   u8 cur_inode_skip_truncate:1;
+   u8 cur_inode_deleted:1;
u64 cur_inode_size;
u64 cur_inode_mode;
u64 cur_inode_rdev;
@@ -5313,6 +5314,19 @@ static int clone_range(struct send_ctx *sctx,
return ret;
 }
 
+static int truncate_before_falloc(struct send_ctx *sctx)
+{
+   int ret = 0;
+
+   if (!sctx->cur_inode_skip_truncate) {
+   ret = send_truncate(sctx, sctx->cur_ino,
+   sctx->cur_inode_gen,
+   sctx->cur_inode_size);
+   sctx->cur_inode_skip_truncate = 1;
+   }
+   return ret;
+}
+
 static int send_write_or_clone(struct send_ctx *sctx,
   struct btrfs_path *path,
   struct btrfs_key *key,
@@ -5354,8 +5368,7 @@ static int send_write_or_clone(struct send_ctx *sctx,
}
 
if (sctx->phase == SEND_PHASE_COMPUTE_DATA_SIZE) {
-   if (offset < sctx->cur_inode_size)
-   sctx->total_data_size += len;
+   sctx->total_data_size += len;
goto out;
}
 
@@ -5373,6 +5386,21 @@ static int send_write_or_clone(struct send_ctx *sctx,
 offset < sctx->cu

[PATCH v2 5/6] btrfs-progs: add total data size, fallocate to dump

2018-05-27 Thread Howard McLauchlan
From: Howard McLauchlan 

Adding entries to dump for new commands (total data size, fallocate).

Signed-off-by: Howard McLauchlan 
---
 send-dump.c | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/send-dump.c b/send-dump.c
index 1591e0cc..c5a695a2 100644
--- a/send-dump.c
+++ b/send-dump.c
@@ -316,6 +316,21 @@ static int print_update_extent(const char *path, u64 
offset, u64 len,
  offset, len);
 }
 
+static int print_total_data_size(u64 size, void *user)
+{
+   char path;
+
+   return PRINT_DUMP(user, &path, "total_data_size", "size=%llu", size);
+}
+
+static int print_fallocate(const char *path, u32 flags, u64 offset, u64 len,
+  void *user)
+{
+   return PRINT_DUMP(user, path, "fallocate",
+ "flags=%u offset=%llu len=%llu", flags, offset,
+ len);
+}
+
 struct btrfs_send_ops btrfs_print_send_ops = {
.subvol = print_subvol,
.snapshot = print_snapshot,
@@ -337,5 +352,7 @@ struct btrfs_send_ops btrfs_print_send_ops = {
.chmod = print_chmod,
.chown = print_chown,
.utimes = print_utimes,
-   .update_extent = print_update_extent
+   .update_extent = print_update_extent,
+   .total_data_size = print_total_data_size,
+   .fallocate = print_fallocate,
 };
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 1/6] btrfs: send, bump stream version

2018-05-27 Thread Howard McLauchlan
From: Filipe David Borba Manana 

This increases the send stream version from version 1 to version 2, adding
new commands:

1) total data size - used to tell the receiver how much file data the stream
   will add or update;

2) fallocate - used to pre-allocate space for files and to punch holes in files;

3) inode set flags (chattr);

4) set inode otime;

5) sending compressed writes.

This is preparation work for subsequent changes that implement the new features.

A version 2 stream is only produced if the send ioctl caller passes in one of 
the
new flags (BTRFS_SEND_FLAG_CALCULATE_DATA_SIZE | BTRFS_SEND_FLAG_STREAM_V2), 
meaning
old clients are unaffected.

[Howard: rebased on 4.17-rc7]
Signed-off-by: Howard McLauchlan 
Signed-off-by: Filipe David Borba Manana 
Reviewed-by: Omar Sandoval 
---
 fs/btrfs/send.c|  7 ++-
 fs/btrfs/send.h| 22 +-
 include/uapi/linux/btrfs.h | 21 -
 3 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index c0074d2d7d6d..eccd69387065 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -649,7 +649,10 @@ static int send_header(struct send_ctx *sctx)
struct btrfs_stream_header hdr;
 
strcpy(hdr.magic, BTRFS_SEND_STREAM_MAGIC);
-   hdr.version = cpu_to_le32(BTRFS_SEND_STREAM_VERSION);
+   if (sctx->flags & BTRFS_SEND_FLAG_STREAM_V2)
+   hdr.version = cpu_to_le32(BTRFS_SEND_STREAM_VERSION_2);
+   else
+   hdr.version = cpu_to_le32(BTRFS_SEND_STREAM_VERSION_1);
 
return write_buf(sctx->send_filp, &hdr, sizeof(hdr),
&sctx->send_off);
@@ -6535,6 +6538,8 @@ long btrfs_ioctl_send(struct file *mnt_file, struct 
btrfs_ioctl_send_args *arg)
INIT_LIST_HEAD(&sctx->name_cache_list);
 
sctx->flags = arg->flags;
+   if (sctx->flags & BTRFS_SEND_FLAG_CALCULATE_DATA_SIZE)
+   sctx->flags |= BTRFS_SEND_FLAG_STREAM_V2;
 
sctx->send_filp = fget(arg->send_fd);
if (!sctx->send_filp) {
diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h
index ead397f7034f..152180304078 100644
--- a/fs/btrfs/send.h
+++ b/fs/btrfs/send.h
@@ -10,7 +10,8 @@
 #include "ctree.h"
 
 #define BTRFS_SEND_STREAM_MAGIC "btrfs-stream"
-#define BTRFS_SEND_STREAM_VERSION 1
+#define BTRFS_SEND_STREAM_VERSION_1 1
+#define BTRFS_SEND_STREAM_VERSION_2 2
 
 #define BTRFS_SEND_BUF_SIZE SZ_64K
 #define BTRFS_SEND_READ_SIZE (48 * SZ_1K)
@@ -77,6 +78,16 @@ enum btrfs_send_cmd {
 
BTRFS_SEND_C_END,
BTRFS_SEND_C_UPDATE_EXTENT,
+
+   /*
+* The following commands were added in stream version 2.
+*/
+   BTRFS_SEND_C_TOTAL_DATA_SIZE,
+   BTRFS_SEND_C_FALLOCATE,
+   BTRFS_SEND_C_CHATTR,
+   BTRFS_SEND_C_UTIMES2, /* Same as UTIMES, but it includes OTIME too. */
+   BTRFS_SEND_C_WRITE_COMPRESSED, /* to be implemented */
+
__BTRFS_SEND_C_MAX,
 };
 #define BTRFS_SEND_C_MAX (__BTRFS_SEND_C_MAX - 1)
@@ -115,10 +126,19 @@ enum {
BTRFS_SEND_A_CLONE_OFFSET,
BTRFS_SEND_A_CLONE_LEN,
 
+   /*
+* The following attributes were added in stream version 2.
+*/
+   BTRFS_SEND_A_FALLOCATE_FLAGS,
+   BTRFS_SEND_A_CHATTR,
+
__BTRFS_SEND_A_MAX,
 };
 #define BTRFS_SEND_A_MAX (__BTRFS_SEND_A_MAX - 1)
 
+#define BTRFS_SEND_A_FALLOCATE_FLAG_KEEP_SIZE   (1 << 0)
+#define BTRFS_SEND_A_FALLOCATE_FLAG_PUNCH_HOLE  (1 << 1)
+
 #ifdef __KERNEL__
 long btrfs_ioctl_send(struct file *mnt_file, struct btrfs_ioctl_send_args 
*arg);
 #endif
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index c8d99b9ca550..ed63176660d2 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -711,10 +711,29 @@ struct btrfs_ioctl_received_subvol_args {
  */
 #define BTRFS_SEND_FLAG_OMIT_END_CMD   0x4
 
+/*
+ * Calculate the amount (in bytes) of new file data between the send and
+ * parent snapshots, or in case of a full send, the total amount of file data
+ * we will send.
+ * This corresponds to the sum of the data lengths of each write, clone and
+ * fallocate commands that are sent through the send stream. The receiving end
+ * can use this information to compute progress.
+ *
+ * Added in send stream version 2, and implies producing a version 2 stream.
+ */
+#define BTRFS_SEND_FLAG_CALCULATE_DATA_SIZE0x8
+
+/*
+ * Used by a client to request a version 2 of the send stream.
+ */
+#define BTRFS_SEND_FLAG_STREAM_V2  0x10
+
 #define BTRFS_SEND_FLAG_MASK \
(BTRFS_SEND_FLAG_NO_FILE_DATA | \
 BTRFS_SEND_FLAG_OMIT_STREAM_HEADER | \
-BTRFS_SEND_FLAG_OMIT_END_CMD)
+BTRFS_SEND_FLAG_OMIT_END_CMD | \
+BTRFS_SEND_FLAG_CALCULATE_DATA_SIZE | \
+BTRFS_SEND_FLAG_STREAM_V2)
 
 struct btrfs_ioctl_send_args {
__s64 send_fd;  /* in */
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" i

[PATCH v2 2/6] Btrfs-progs: send, implement total data size callback and progress report

2018-05-27 Thread Howard McLauchlan
From: Filipe Manana 

This is a followup to the kernel patch titled:

Btrfs: send, implement total data size command to allow for progress 
estimation

This makes the btrfs send and receive commands aware of the new send flag,
named BTRFS_SEND_C_TOTAL_DATA_SIZE, which tells us the amount of file data
that is new between the parent and send snapshots/roots. As this command
immediately follows the commands to start a snapshot/subvolume, it can be
used to report and compute progress, by keeping a counter that is incremented
with the data length of each write, clone and fallocate command that is received
from the stream.

Example:

$ btrfs send -s --stream-version 2 /mnt/sdd/snap_base | btrfs receive 
/mnt/sdc
At subvol /mnt/sdd/snap_base
At subvol snap_base
About to receive 9212392667 bytes
Subvolume /mnt/sdc//snap_base, 4059722426 / 9212392667 bytes received, 
44.07%, 40.32MB/s

$ btrfs send -s --stream-version 2 -p /mnt/sdd/snap_base /mnt/sdd/snap_incr 
| btrfs receive /mnt/sdc
At subvol /mnt/sdd/snap_incr
At subvol snap_incr
About to receive 9571342213 bytes
Subvolume /mnt/sdc//snap_incr, 6557345221 / 9571342213 bytes received, 
68.51%, 51.04MB/s

At the moment progress is only reported by btrfs-receive, but it is possible 
and simple
to do it for btrfs-send too, so that we can get progress report when not piping 
btrfs-send
output to btrfs-receive (directly to a file).

Signed-off-by: Filipe David Borba Manana 
---
 cmds-receive.c | 91 ++
 cmds-send.c| 23 +++--
 send-stream.c  |  4 +++
 send-stream.h  |  1 +
 4 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/cmds-receive.c b/cmds-receive.c
index 68123a31..d8ff5194 100644
--- a/cmds-receive.c
+++ b/cmds-receive.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -79,6 +80,14 @@ struct btrfs_receive
 
int honor_end_cmd;
 
+   /* For the subvolume/snapshot we're currently receiving. */
+   u64 total_data_size;
+   u64 bytes_received;
+   time_t last_progress_update;
+   u64 bytes_received_last_update;
+   float progress;
+   const char *target;
+
/*
 * Buffer to store capabilities from security.capabilities xattr,
 * usually 20 bytes, but make same room for potentially larger
@@ -156,6 +165,16 @@ out:
return ret;
 }
 
+static void reset_progress(struct btrfs_receive *rctx, const char *dest)
+{
+   rctx->total_data_size = 0;
+   rctx->bytes_received = 0;
+   rctx->progress = 0.0;
+   rctx->last_progress_update = 0;
+   rctx->bytes_received_last_update = 0;
+   rctx->target = dest;
+}
+
 static int process_subvol(const char *path, const u8 *uuid, u64 ctransid,
  void *user)
 {
@@ -180,6 +199,7 @@ static int process_subvol(const char *path, const u8 *uuid, 
u64 ctransid,
ret = -EINVAL;
goto out;
}
+   reset_progress(rctx, "Subvolume");
 
if (*rctx->dest_dir_path == 0) {
strncpy_null(rctx->cur_subvol_path, path);
@@ -249,6 +269,7 @@ static int process_snapshot(const char *path, const u8 
*uuid, u64 ctransid,
ret = -EINVAL;
goto out;
}
+   reset_progress(rctx, "Snapshot");
 
if (*rctx->dest_dir_path == 0) {
strncpy_null(rctx->cur_subvol_path, path);
@@ -388,6 +409,73 @@ out:
return ret;
 }
 
+static int process_total_data_size(u64 size, void *user)
+{
+   struct btrfs_receive *rctx = user;
+
+   rctx->total_data_size = size;
+   fprintf(stdout, "About to receive %llu bytes\n", size);
+
+   return 0;
+}
+
+static void update_progress(struct btrfs_receive *rctx, u64 bytes)
+{
+   float new_progress;
+   time_t now;
+   time_t tdiff;
+
+   if (rctx->total_data_size == 0)
+   return;
+
+   rctx->bytes_received += bytes;
+
+   now = time(NULL);
+   tdiff = now - rctx->last_progress_update;
+   if (tdiff < 1) {
+   if (rctx->bytes_received == rctx->total_data_size)
+   fprintf(stdout, "\n");
+   return;
+   }
+
+   new_progress = ((float)rctx->bytes_received / rctx->total_data_size) * 
100.0;
+
+   if ((int)(new_progress * 100) > (int)(rctx->progress * 100) ||
+   rctx->bytes_received == rctx->total_data_size) {
+   char line[5000];
+   float rate = rctx->bytes_received - 
rctx->bytes_received_last_update;
+   const char *rate_units;
+
+   rate /= tdiff;
+   if (rate > (1024 * 1024)) {
+   rate_units = "MB/s";
+   rate /= 1024 * 1024;
+   } else if (rate > 1024) {
+   rate_units = "KB/s";
+   rate /= 1024;
+   } else {
+   rate_units = "B/s";
+ 

[PATCH v2 2/2] xfstests: btrfs, test send's ability to punch holes, prealloc extents and send data size

2018-05-27 Thread Howard McLauchlan
From: Filipe Manana 

This test verifies that after an incremental btrfs send the
replicated file has the same exact hole and data structure as in
the origin filesystem. This didn't use to be the case before the
send stream version 2 - holes were sent as write operations of 0
valued bytes instead of punching holes with the fallocate system
call, and pre-allocated extents were sent as well as write
operations of 0 valued bytes instead of intructions for the
receiver to use the fallocate system call.

It also checks that prealloc extents that lie beyond the file's
size are replicated by an incremental send.

Also update existing test btrfs/161 to verify total data size

[Howard: rebased on kernel v4.17-rc7, btrfs progs v4.16.1]
Signed-off-by: Howard McLauchlan 
Signed-off-by: Filipe David Borba Manana 
---
 common/rc   |  10 
 tests/btrfs/161 |  10 ++--
 tests/btrfs/162 | 121 
 tests/btrfs/162.out |  42 +++
 tests/btrfs/group   |   1 +
 5 files changed, 179 insertions(+), 5 deletions(-)
 create mode 100755 tests/btrfs/162
 create mode 100644 tests/btrfs/162.out

diff --git a/common/rc b/common/rc
index ffe53236..3bc15fef 100644
--- a/common/rc
+++ b/common/rc
@@ -3788,6 +3788,16 @@ _require_scratch_feature()
esac
 }
 
+_require_btrfs_stream_version_2()
+{
+   $BTRFS_UTIL_PROG send 2>&1 | \
+   grep '^[ \t]*\--stream-version[ \t]\+.*' > /dev/null 2>&1
+   if [ $? -ne 0 ]; then
+   _notrun "Missing btrfs-progs send -a command line option, 
skipped this test"
+   fi
+}
+
+
 init_rc
 
 

diff --git a/tests/btrfs/161 b/tests/btrfs/161
index 6c30a5e2..3f1e9868 100755
--- a/tests/btrfs/161
+++ b/tests/btrfs/161
@@ -94,7 +94,7 @@ $CHATTR_PROG +a $SCRATCH_MNT/parent/foo
 # Send/Receive initial snapshot
 _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT/parent \
$SCRATCH_MNT/old_parent
-_run_btrfs_util_prog send --stream-version 2 -f $SCRATCH_MNT/out 
$SCRATCH_MNT/old_parent
+_run_btrfs_util_prog send -s --stream-version 2 -f $SCRATCH_MNT/out 
$SCRATCH_MNT/old_parent
 _run_btrfs_util_prog receive -f $SCRATCH_MNT/out $SCRATCH_MNT/receive
 
 # Verify post-send content and flags
@@ -111,7 +111,7 @@ $CHATTR_PROG +a $SCRATCH_MNT/parent/foo
 # Send incremental change
 _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT/parent \
$SCRATCH_MNT/child_1
-_run_btrfs_util_prog send --stream-version 2 -p $SCRATCH_MNT/old_parent -f 
$SCRATCH_MNT/out \
+_run_btrfs_util_prog send -s --stream-version 2 -p $SCRATCH_MNT/old_parent -f 
$SCRATCH_MNT/out \
 $SCRATCH_MNT/child_1
 _run_btrfs_util_prog receive -f $SCRATCH_MNT/out $SCRATCH_MNT/receive
 
@@ -129,7 +129,7 @@ $CHATTR_PROG +d $SCRATCH_MNT/parent/foo
 # Send incremental change
 _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT/parent \
$SCRATCH_MNT/child_2
-_run_btrfs_util_prog send --stream-version 2 -p $SCRATCH_MNT/old_parent -f 
$SCRATCH_MNT/out \
+_run_btrfs_util_prog send -s --stream-version 2 -p $SCRATCH_MNT/old_parent -f 
$SCRATCH_MNT/out \
 $SCRATCH_MNT/child_2
 _run_btrfs_util_prog receive -f $SCRATCH_MNT/out $SCRATCH_MNT/receive
 
@@ -146,7 +146,7 @@ $CHATTR_PROG +a $SCRATCH_MNT/parent/foo
 # Send incremental change against child_2
 _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT/parent \
$SCRATCH_MNT/child_3
-_run_btrfs_util_prog send --stream-version 2 -p $SCRATCH_MNT/child_2 -f 
$SCRATCH_MNT/out \
+_run_btrfs_util_prog send -s --stream-version 2 -p $SCRATCH_MNT/child_2 -f 
$SCRATCH_MNT/out \
 $SCRATCH_MNT/child_3
 _run_btrfs_util_prog receive -f $SCRATCH_MNT/out $SCRATCH_MNT/receive
 
@@ -166,7 +166,7 @@ $CHATTR_PROG +a $SCRATCH_MNT/parent/foo
 # Send incremental change
 _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT/parent \
$SCRATCH_MNT/child_4
-_run_btrfs_util_prog send --stream-version 2 -p $SCRATCH_MNT/old_parent -f 
$SCRATCH_MNT/out \
+_run_btrfs_util_prog send -s --stream-version 2 -p $SCRATCH_MNT/old_parent -f 
$SCRATCH_MNT/out \
 $SCRATCH_MNT/child_4
 _run_btrfs_util_prog receive -f $SCRATCH_MNT/out $SCRATCH_MNT/receive
 
diff --git a/tests/btrfs/162 b/tests/btrfs/162
new file mode 100755
index ..6789b68c
--- /dev/null
+++ b/tests/btrfs/162
@@ -0,0 +1,121 @@
+#! /bin/bash
+# FS QA Test No. btrfs/162
+#
+# Verify that after an incremental btrfs send the replicated file has
+# the same exact hole and data structure as in the origin filesystem.
+# This didn't use to be the case before the send stream version 2 -
+# holes were sent as write operations of 0 valued bytes instead of punching
+# holes with the fallocate system call, and pre-allocated extents were sent
+# as well as write operations of 0 valued bytes instead of intructions for
+# the receiver to use the fallocate system call. Also check that prealloc
+# extents that lie beyond the file's size are replicated by an incremental
+#

[RFC PATCH v2 5/6] btrfs: add send_stream_version attribute to sysfs

2018-05-27 Thread Howard McLauchlan
From: Filipe David Borba Manana 

So that applications can find out what's the highest send stream
version supported/implemented by the running kernel:

$ cat /sys/fs/btrfs/send/stream_version
2

[Howard: rebased on 4.17-rc7]
Reviewed-by: Omar Sandoval 
Signed-off-by: Howard McLauchlan 
Signed-off-by: Filipe David Borba Manana 
Reviewed-by: David Sterba 
---
 fs/btrfs/send.h  |  1 +
 fs/btrfs/sysfs.c | 27 +++
 2 files changed, 28 insertions(+)

diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h
index b6e9281f171a..267674d3e954 100644
--- a/fs/btrfs/send.h
+++ b/fs/btrfs/send.h
@@ -12,6 +12,7 @@
 #define BTRFS_SEND_STREAM_MAGIC "btrfs-stream"
 #define BTRFS_SEND_STREAM_VERSION_1 1
 #define BTRFS_SEND_STREAM_VERSION_2 2
+#define BTRFS_SEND_STREAM_VERSION_LATEST BTRFS_SEND_STREAM_VERSION_2
 
 #define BTRFS_SEND_BUF_SIZE SZ_64K
 #define BTRFS_SEND_READ_SIZE (48 * SZ_1K)
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 4848a4318fb5..718bc927ea13 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -18,6 +18,7 @@
 #include "transaction.h"
 #include "sysfs.h"
 #include "volumes.h"
+#include "send.h"
 
 static inline struct btrfs_fs_info *to_fs_info(struct kobject *kobj);
 static inline struct btrfs_fs_devices *to_fs_devs(struct kobject *kobj);
@@ -884,6 +885,26 @@ static int btrfs_init_debugfs(void)
return 0;
 }
 
+static ssize_t send_stream_version_show(struct kobject *kobj,
+   struct kobj_attribute *a,
+   char *buf)
+{
+   return snprintf(buf, PAGE_SIZE, "%d\n",
+   BTRFS_SEND_STREAM_VERSION_LATEST);
+}
+
+BTRFS_ATTR(, stream_version, send_stream_version_show);
+
+static struct attribute *btrfs_send_attrs[] = {
+   BTRFS_ATTR_PTR(, stream_version),
+   NULL
+};
+
+static const struct attribute_group btrfs_send_attr_group = {
+   .name = "send",
+   .attrs = btrfs_send_attrs,
+};
+
 int __init btrfs_init_sysfs(void)
 {
int ret;
@@ -900,8 +921,13 @@ int __init btrfs_init_sysfs(void)
ret = sysfs_create_group(&btrfs_kset->kobj, &btrfs_feature_attr_group);
if (ret)
goto out2;
+   ret = sysfs_create_group(&btrfs_kset->kobj, &btrfs_send_attr_group);
+   if (ret)
+   goto out3;
 
return 0;
+out3:
+   sysfs_remove_group(&btrfs_kset->kobj, &btrfs_feature_attr_group);
 out2:
debugfs_remove_recursive(btrfs_debugfs_root_dentry);
 out1:
@@ -913,6 +939,7 @@ int __init btrfs_init_sysfs(void)
 void __cold btrfs_exit_sysfs(void)
 {
sysfs_remove_group(&btrfs_kset->kobj, &btrfs_feature_attr_group);
+   sysfs_remove_group(&btrfs_kset->kobj, &btrfs_send_attr_group);
kset_unregister(btrfs_kset);
debugfs_remove_recursive(btrfs_debugfs_root_dentry);
 }
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 6/6] btrfs: add chattr support for send/receive

2018-05-27 Thread Howard McLauchlan
From: Howard McLauchlan 

Presently btrfs send/receive does not propagate inode attribute flags;
all chattr operations are effectively discarded upon transmission.

This patch adds kernel support for inode attribute flags. Userspace
support can be found under the commit:

btrfs-progs: add chattr support for send/receive

An associated xfstest can be found at:

btrfs: add verify chattr support for send/receive test

These changes are only enabled for send stream version 2

Signed-off-by: Howard McLauchlan 
---
 fs/btrfs/ctree.h |   2 +
 fs/btrfs/ioctl.c |   2 +-
 fs/btrfs/send.c  | 183 +++
 3 files changed, 158 insertions(+), 29 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0d422c9908b8..002fe3ad193a 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1461,6 +1461,8 @@ struct btrfs_map_token {
unsigned long offset;
 };
 
+unsigned int btrfs_flags_to_ioctl(unsigned int flags);
+
 #define BTRFS_BYTES_TO_BLKS(fs_info, bytes) \
((bytes) >> (fs_info)->sb->s_blocksize_bits)
 
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 632e26d6f7ce..36ce1e589f9e 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -106,7 +106,7 @@ static unsigned int btrfs_mask_flags(umode_t mode, unsigned 
int flags)
 /*
  * Export inode flags to the format expected by the FS_IOC_GETFLAGS ioctl.
  */
-static unsigned int btrfs_flags_to_ioctl(unsigned int flags)
+unsigned int btrfs_flags_to_ioctl(unsigned int flags)
 {
unsigned int iflags = 0;
 
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 84dacb20d832..a36a2983b34a 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -108,6 +108,13 @@ struct send_ctx {
u64 cur_inode_last_extent;
u64 cur_inode_next_write_offset;
 
+   /*
+* state for chattr purposes
+*/
+   u64 cur_inode_flip_flags;
+   u64 cur_inode_receive_flags;
+   bool receive_flags_valid;
+
u64 send_progress;
u64 total_data_size;
 
@@ -815,7 +822,7 @@ static int send_rmdir(struct send_ctx *sctx, struct fs_path 
*path)
  */
 static int __get_inode_info(struct btrfs_root *root, struct btrfs_path *path,
  u64 ino, u64 *size, u64 *gen, u64 *mode, u64 *uid,
- u64 *gid, u64 *rdev)
+ u64 *gid, u64 *rdev, u64 *flags)
 {
int ret;
struct btrfs_inode_item *ii;
@@ -845,6 +852,8 @@ static int __get_inode_info(struct btrfs_root *root, struct 
btrfs_path *path,
*gid = btrfs_inode_gid(path->nodes[0], ii);
if (rdev)
*rdev = btrfs_inode_rdev(path->nodes[0], ii);
+   if (flags)
+   *flags = btrfs_inode_flags(path->nodes[0], ii);
 
return ret;
 }
@@ -852,7 +861,7 @@ static int __get_inode_info(struct btrfs_root *root, struct 
btrfs_path *path,
 static int get_inode_info(struct btrfs_root *root,
  u64 ino, u64 *size, u64 *gen,
  u64 *mode, u64 *uid, u64 *gid,
- u64 *rdev)
+ u64 *rdev, u64 *flags)
 {
struct btrfs_path *path;
int ret;
@@ -861,7 +870,7 @@ static int get_inode_info(struct btrfs_root *root,
if (!path)
return -ENOMEM;
ret = __get_inode_info(root, path, ino, size, gen, mode, uid, gid,
-  rdev);
+  rdev, flags);
btrfs_free_path(path);
return ret;
 }
@@ -1250,7 +1259,7 @@ static int __iterate_backrefs(u64 ino, u64 offset, u64 
root, void *ctx_)
 * accept clones from these extents.
 */
ret = __get_inode_info(found->root, bctx->path, ino, &i_size, NULL, 
NULL,
-  NULL, NULL, NULL);
+  NULL, NULL, NULL, NULL);
btrfs_release_path(bctx->path);
if (ret < 0)
return ret;
@@ -1610,7 +1619,7 @@ static int get_cur_inode_state(struct send_ctx *sctx, u64 
ino, u64 gen)
u64 right_gen;
 
ret = get_inode_info(sctx->send_root, ino, NULL, &left_gen, NULL, NULL,
-   NULL, NULL);
+   NULL, NULL, NULL);
if (ret < 0 && ret != -ENOENT)
goto out;
left_ret = ret;
@@ -1619,7 +1628,7 @@ static int get_cur_inode_state(struct send_ctx *sctx, u64 
ino, u64 gen)
right_ret = -ENOENT;
} else {
ret = get_inode_info(sctx->parent_root, ino, NULL, &right_gen,
-   NULL, NULL, NULL, NULL);
+   NULL, NULL, NULL, NULL, NULL);
if (ret < 0 && ret != -ENOENT)
goto out;
right_ret = ret;
@@ -1788,7 +1797,7 @@ static int get_first_ref(struct btrfs_root *root, u64 ino,
 
if (dir_gen) {
ret = get_inode_info(root, parent_dir, NULL, dir_gen, NULL,
-  

[PATCH v2 3/6] Btrfs-progs: send, implement fallocate command callback

2018-05-27 Thread Howard McLauchlan
From: Filipe Manana 

The fallocate send stream command, added in stream version 2, is used to
pre-allocate space for files and punch file holes. This change implements
the callback for that new command, using the fallocate function from the
standard C library to carry out the specified action (allocate file space
or punch a file hole).

Signed-off-by: Filipe David Borba Manana 
---
 cmds-receive.c | 44 
 send-stream.c  | 13 +
 send-stream.h  |  2 ++
 3 files changed, 59 insertions(+)

diff --git a/cmds-receive.c b/cmds-receive.c
index d8ff5194..510b6bc8 100644
--- a/cmds-receive.c
+++ b/cmds-receive.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ctree.h"
 #include "ioctl.h"
@@ -1149,6 +1150,48 @@ static int process_update_extent(const char *path, u64 
offset, u64 len,
return 0;
 }
 
+static int process_fallocate(const char *path, u32 flags, u64 offset,
+u64 len, void *user)
+{
+   struct btrfs_receive *rctx = user;
+   int mode = 0;
+   int ret;
+   char full_path[PATH_MAX];
+
+   ret = path_cat_out(full_path, rctx->full_subvol_path, path);
+   if (ret < 0) {
+   error("fallocate: path invalid: %s", path);
+   goto out;
+   }
+
+   if (flags & BTRFS_SEND_A_FALLOCATE_FLAG_KEEP_SIZE)
+   mode |= FALLOC_FL_KEEP_SIZE;
+   if (flags & BTRFS_SEND_A_FALLOCATE_FLAG_PUNCH_HOLE)
+   mode |= FALLOC_FL_PUNCH_HOLE;
+
+   if (g_verbose >= 2)
+   fprintf(stderr,
+   "fallocate %s - flags %u, offset %llu, len %llu\n",
+   path, flags, offset, len);
+
+   ret = open_inode_for_write(rctx, full_path);
+   if (ret < 0)
+   goto out;
+
+   ret = fallocate(rctx->write_fd, mode, offset, len);
+   if (ret) {
+   ret = -errno;
+   fprintf(stderr,
+   "ERROR: fallocate against %s failed. %s\n",
+   path, strerror(-ret));
+   goto out;
+   }
+   update_progress(rctx, len);
+
+out:
+   return ret;
+}
+
 static struct btrfs_send_ops send_ops = {
.subvol = process_subvol,
.snapshot = process_snapshot,
@@ -1172,6 +1215,7 @@ static struct btrfs_send_ops send_ops = {
.utimes = process_utimes,
.update_extent = process_update_extent,
.total_data_size = process_total_data_size,
+   .fallocate = process_fallocate,
 };
 
 static int do_receive(struct btrfs_receive *rctx, const char *tomnt,
diff --git a/send-stream.c b/send-stream.c
index d30fd5a7..74ec37dd 100644
--- a/send-stream.c
+++ b/send-stream.c
@@ -457,6 +457,19 @@ static int read_and_process_cmd(struct btrfs_send_stream 
*sctx)
TLV_GET_U64(sctx, BTRFS_SEND_A_SIZE, &tmp);
ret = sctx->ops->total_data_size(tmp, sctx->user);
break;
+   case BTRFS_SEND_C_FALLOCATE:
+   {
+   u32 flags;
+   u64 len;
+
+   TLV_GET_STRING(sctx, BTRFS_SEND_A_PATH, &path);
+   TLV_GET_U32(sctx, BTRFS_SEND_A_FALLOCATE_FLAGS, &flags);
+   TLV_GET_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, &offset);
+   TLV_GET_U64(sctx, BTRFS_SEND_A_SIZE, &len);
+   ret = sctx->ops->fallocate(path, flags, offset, len,
+   sctx->user);
+   }
+   break;
case BTRFS_SEND_C_END:
ret = 1;
break;
diff --git a/send-stream.h b/send-stream.h
index 5b244ab6..89e64043 100644
--- a/send-stream.h
+++ b/send-stream.h
@@ -67,6 +67,8 @@ struct btrfs_send_ops {
  void *user);
int (*update_extent)(const char *path, u64 offset, u64 len, void *user);
int (*total_data_size)(u64 size, void *user);
+   int (*fallocate)(const char *path, u32 flags, u64 offset,
+u64 len, void *user);
 };
 
 int btrfs_read_and_process_send_stream(int fd,
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] btrfs: add verify chattr support for send/receive test

2018-05-27 Thread Howard McLauchlan
From: Howard McLauchlan 

This test aims to verify correct behaviour with chattr operations and
btrfs send/receive. The intent is to check general correctness as well
as special interactions with troublesome flags(immutable, append only).

This test is motivated by a bug in btrfs which demonstrates a lack of
chattr support in btrfs send/receive.

A kernel patch to fix this can be found at:

btrfs: add chattr support for send/receive

The accompanying userspace patch can be found at:

btrfs-progs: add chattr support for send/receive

Signed-off-by: Howard McLauchlan 
---
 tests/btrfs/161 | 202 
 tests/btrfs/161.out |  38 +
 tests/btrfs/group   |   1 +
 3 files changed, 241 insertions(+)
 create mode 100755 tests/btrfs/161
 create mode 100644 tests/btrfs/161.out

diff --git a/tests/btrfs/161 b/tests/btrfs/161
new file mode 100755
index ..6c30a5e2
--- /dev/null
+++ b/tests/btrfs/161
@@ -0,0 +1,202 @@
+#! /bin/bash
+# FS QA Test 161
+#
+# This test verifies the correct behaviour of chattr support for btrfs
+# send/receive; 6 cases will be tested:
+# 1. New inode created with an inode flag set
+# 2. Existing inode with BTRFS_INODE_APPEND is written to with sequence:
+#  chattr -a
+#  pwrite something
+#  chattr +a
+# 3. Existing inode with BTRFS_INODE_APPEND is written to with sequence:
+#  chattr -a
+#  pwrite something
+#  chattr +d
+# 4. Existing inode is written to with sequence:
+#  setfattr something
+#  chattr +a
+# 5. Existing inode with BTRFS_INODE_APPEND is written to with sequence:
+#  chattr -a
+#  setfattr something
+#  setfattr something else
+#  chattr +a
+# 6. As above, but with pwrite instead of setfattr
+# The goal of 5 and 6 is not to test correctness, but to ensure we
+# don't send extra chattrs that are unnecessary.
+#
+# We verify the md5sum of the snapshots in the receive directory to ensure file
+# contents have changed appropriately. We also observe the flags changing (or
+# not changing) as appropriate.
+#
+#---
+# Copyright (c) 2018 Facebook.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+
+send_files_dir=$TEST_DIR/btrfs-test-$seq
+
+rm -rf $send_files_dir
+mkdir $send_files_dir
+
+_scratch_mkfs >>$seqres.full 2>&1
+_scratch_mount
+
+# Create receive directory
+mkdir $SCRATCH_MNT/receive
+
+# Create test file and set chattr flag
+_run_btrfs_util_prog subvolume create $SCRATCH_MNT/parent
+$XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 32K" $SCRATCH_MNT/parent/foo | 
_filter_xfs_io
+$CHATTR_PROG +a $SCRATCH_MNT/parent/foo
+
+# Send/Receive initial snapshot
+_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT/parent \
+   $SCRATCH_MNT/old_parent
+_run_btrfs_util_prog send --stream-version 2 -f $SCRATCH_MNT/out 
$SCRATCH_MNT/old_parent
+_run_btrfs_util_prog receive -f $SCRATCH_MNT/out $SCRATCH_MNT/receive
+
+# Verify post-send content and flags
+echo "post-send file digest for old_parent:"
+md5sum $SCRATCH_MNT/old_parent/foo | _filter_scratch
+echo "post-send file flag for old_parent:"
+lsattr $SCRATCH_MNT/receive/old_parent/foo | _filter_scratch
+
+# Make change
+$CHATTR_PROG -a $SCRATCH_MNT/parent/foo
+$XFS_IO_PROG -f -c "pwrite -S 0xab 0K 32K" $SCRATCH_MNT/parent/foo | 
_filter_xfs_io
+$CHATTR_PROG +a $SCRATCH_MNT/parent/foo
+
+# Send incremental change
+_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT/parent \
+   $SCRATCH_MNT/child_1
+_run_btrfs_util_prog send --stream-version 2 -p $SCRATCH_MNT/old_parent -f 
$SCRATCH_MNT/out \
+$SCRATCH_MNT/child_1
+_run_btrfs_util_prog receive -f $SCRATCH_MNT/out $SCRATCH_MNT

[RFC PATCH v2 2/6] btrfs: send, implement total data size command to allow for progress estimation

2018-05-27 Thread Howard McLauchlan
From: Filipe David Borba Manana 

This new send flag makes send calculate first the amount of new file data (in 
bytes)
the send root has relatively to the parent root, or for the case of a 
non-incremental
send, the total amount of file data the stream will create (including holes and 
prealloc
extents). In other words, it computes the sum of the lengths of all write, 
clone and
fallocate operations that will be sent through the send stream.

This data size value is sent in a new command, named 
BTRFS_SEND_C_TOTAL_DATA_SIZE, that
immediately follows a BTRFS_SEND_C_SUBVOL or BTRFS_SEND_C_SNAPSHOT command, and 
precedes
any command that changes a file or the filesystem hierarchy. Upon receiving a 
write, clone
or fallocate command, the receiving end can increment a counter by the data 
length of that
command and therefore report progress by comparing the counter's value with the 
data size
value received in the BTRFS_SEND_C_TOTAL_DATA_SIZE command.

The approach is simple, before the normal operation of send, do a scan in the 
file system
tree for new inodes and new/changed file extent items, just like in send's 
normal operation,
and keep incrementing a counter with new inodes' size and the size of file 
extents (and file
holes)  that are going to be written, cloned or fallocated. This is actually a 
simpler and
more lightweight tree scan/processing than the one we do when sending the 
changes, as it
doesn't process inode references nor does any lookups in the extent tree for 
example.

After modifying btrfs-progs to understand this new command and report progress, 
here's an
example (the -o flag tells btrfs send to pass the new flag to the kernel's send 
ioctl):

$ btrfs send -s --stream-version 2 /mnt/sdd/snap_base | btrfs receive 
/mnt/sdc
At subvol /mnt/sdd/snap_base
At subvol snap_base
About to receive 9212392667 bytes
Subvolume /mnt/sdc//snap_base, 4059722426 / 9212392667 bytes received, 
44.07%, 40.32MB/s

$ btrfs send -s --stream-version 2 -p /mnt/sdd/snap_base /mnt/sdd/snap_incr 
| btrfs receive /mnt/sdc
At subvol /mnt/sdd/snap_incr
At subvol snap_incr
About to receive 9571342213 bytes
Subvolume /mnt/sdc//snap_incr, 6557345221 / 9571342213 bytes received, 
68.51%, 51.04MB/s

[Howard: rebased on 4.17-rc7]
Signed-off-by: Howard McLauchlan 
Signed-off-by: Filipe David Borba Manana 
---
 fs/btrfs/send.c | 189 
 1 file changed, 157 insertions(+), 32 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index eccd69387065..7b184831812b 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -66,7 +66,13 @@ struct clone_root {
 #define SEND_CTX_MAX_NAME_CACHE_SIZE 128
 #define SEND_CTX_NAME_CACHE_CLEAN_SIZE (SEND_CTX_MAX_NAME_CACHE_SIZE * 2)
 
+enum btrfs_send_phase {
+   SEND_PHASE_STREAM_CHANGES,
+   SEND_PHASE_COMPUTE_DATA_SIZE,
+};
+
 struct send_ctx {
+   enum btrfs_send_phase phase;
struct file *send_filp;
loff_t send_off;
char *send_buf;
@@ -102,6 +108,7 @@ struct send_ctx {
u64 cur_inode_next_write_offset;
 
u64 send_progress;
+   u64 total_data_size;
 
struct list_head new_refs;
struct list_head deleted_refs;
@@ -709,6 +716,8 @@ static int send_rename(struct send_ctx *sctx,
struct btrfs_fs_info *fs_info = sctx->send_root->fs_info;
int ret;
 
+   ASSERT(sctx->phase != SEND_PHASE_COMPUTE_DATA_SIZE);
+
btrfs_debug(fs_info, "send_rename %s -> %s", from->start, to->start);
 
ret = begin_cmd(sctx, BTRFS_SEND_C_RENAME);
@@ -758,6 +767,8 @@ static int send_unlink(struct send_ctx *sctx, struct 
fs_path *path)
struct btrfs_fs_info *fs_info = sctx->send_root->fs_info;
int ret;
 
+   ASSERT(sctx->phase != SEND_PHASE_COMPUTE_DATA_SIZE);
+
btrfs_debug(fs_info, "send_unlink %s", path->start);
 
ret = begin_cmd(sctx, BTRFS_SEND_C_UNLINK);
@@ -780,6 +791,7 @@ static int send_rmdir(struct send_ctx *sctx, struct fs_path 
*path)
 {
struct btrfs_fs_info *fs_info = sctx->send_root->fs_info;
int ret;
+   ASSERT(sctx->phase != SEND_PHASE_COMPUTE_DATA_SIZE);
 
btrfs_debug(fs_info, "send_rmdir %s", path->start);
 
@@ -2419,6 +2431,8 @@ static int send_truncate(struct send_ctx *sctx, u64 ino, 
u64 gen, u64 size)
int ret = 0;
struct fs_path *p;
 
+   ASSERT(sctx->phase != SEND_PHASE_COMPUTE_DATA_SIZE);
+
btrfs_debug(fs_info, "send_truncate %llu size=%llu", ino, size);
 
p = fs_path_alloc();
@@ -2449,6 +2463,8 @@ static int send_chmod(struct send_ctx *sctx, u64 ino, u64 
gen, u64 mode)
int ret = 0;
struct fs_path *p;
 
+   ASSERT(sctx->phase != SEND_PHASE_COMPUTE_DATA_SIZE);
+
btrfs_debug(fs_info, "send_chmod %llu mode=%llu", ino, mode);
 
p = fs_path_alloc();
@@ -2479,6 +2495,8 @@ static int send_chown(struct send_ctx *sctx, u64 ino, u64 
gen, u64 uid, u64 gid)
int ret = 0;
st

[RFC PATCH v2 0/6] btrfs send stream version 2

2018-05-27 Thread Howard McLauchlan
This is v2 of send stream version 2. The goal is to provide proper
versioning/compatibility as new features are implemented. v1 can be found here
[1].

v2 adds BTRFS_SEND_C_WRITE_COMPRESSED to patch 1 and style fixes/logic
simplifications to patches 5,6.

v2 also updates btrfs-progs to reflect BTRFS_SEND_C_WRITE_COMPRESSED and updates
the pertinent xfstests to also test the total size command.

As of 4.17-rc7, these changes pass all "send" group xfstests

Cheers,

1: https://patchwork.kernel.org/patch/10388003/

Filipe David Borba Manana (5):
  btrfs: send, bump stream version
  btrfs: send, implement total data size command to allow for progress
estimation
  btrfs: send, use fallocate command to punch holes
  btrfs: send, use fallocate command to allocate extents
  btrfs: add send_stream_version attribute to sysfs

Howard McLauchlan (1):
  btrfs: add chattr support for send/receive

 fs/btrfs/ctree.h   |   2 +
 fs/btrfs/ioctl.c   |   2 +-
 fs/btrfs/send.c| 495 +++--
 fs/btrfs/send.h|  27 +-
 fs/btrfs/sysfs.c   |  27 ++
 include/uapi/linux/btrfs.h |  21 +-
 6 files changed, 493 insertions(+), 81 deletions(-)

-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID-1 refuses to balance large drive

2018-05-27 Thread Brad Templeton
BTW, I decided to follow the original double replace strategy
suggested -- replace 6TB with 8TB and replace 4TB with 6TB.  That
should be sure to leave the 2 large drives each with 2TB free once
expanded, and thus able to fully use all space.

However, the first one has been going for 9 hours and is "189.7% done"
and still going.   Some sort of bug in calculating the completion
status, obviously.  With luck 200% will be enough?

On Sat, May 26, 2018 at 7:21 PM, Brad Templeton  wrote:
> Certainly.  My apologies for not including them before.   As
> described, the disks are reasonably balanced -- not as full as the
> last time.  As such, it might be enough that balance would (slowly)
> free up enough chunks to get things going.  And if I have to, I will
> partially convert to single again.   Certainly btrfs replace seems
> like the most planned and simple path but it will result in a strange
> distribution of the chunks.
>
> Label: 'butter'  uuid: a91755d4-87d8-4acd-ae08-c11e7f1f5438
>Total devices 3 FS bytes used 6.11TiB
>devid1 size 3.62TiB used 3.47TiB path /dev/sdj2Overall:
>Device size:  12.70TiB
>Device allocated: 12.25TiB
>Device unallocated:  459.95GiB
>Device missing:  0.00B
>Used: 12.21TiB
>Free (estimated):246.35GiB  (min: 246.35GiB)
>Data ratio:   2.00
>Metadata ratio:   2.00
>Global reserve:  512.00MiB  (used: 1.32MiB)
>
> Data,RAID1: Size:6.11TiB, Used:6.09TiB
>   /dev/sda3.48TiB
>   /dev/sdi2   5.28TiB
>   /dev/sdj2   3.46TiB
>
> Metadata,RAID1: Size:14.00GiB, Used:12.38GiB
>   /dev/sda8.00GiB
>   /dev/sdi2   7.00GiB
>   /dev/sdj2  13.00GiB
>
> System,RAID1: Size:32.00MiB, Used:888.00KiB
>   /dev/sdi2  32.00MiB
>   /dev/sdj2  32.00MiB
>
> Unallocated:
>   /dev/sda  153.02GiB
>   /dev/sdi2 154.56GiB
>   /dev/sdj2 152.36GiB
>
>   devid2 size 3.64TiB used 3.49TiB path /dev/sda
>devid3 size 5.43TiB used 5.28TiB path /dev/sdi2
>
>
> On Sat, May 26, 2018 at 7:16 PM, Qu Wenruo  wrote:
>>
>>
>> On 2018年05月27日 10:06, Brad Templeton wrote:
>>> Thanks.  These are all things which take substantial fractions of a
>>> day to try, unfortunately.
>>
>> Normally I would suggest just using VM and several small disks (~10G),
>> along with fallocate (the fastest way to use space) to get a basic view
>> of the procedure.
>>
>>> Last time I ended up fixing it in a
>>> fairly kluged way, which was to convert from raid-1 to single long
>>> enough to get enough single blocks that when I converted back to
>>> raid-1 they got distributed to the right drives.
>>
>> Yep, that's the ultimate one-fit-all solution.
>> Also, this reminds me about the fact we could do the
>> RAID1->Single/DUP->Single downgrade in a much much faster way.
>> I think it's worthy considering for later enhancement.
>>
>>>  But this is, aside
>>> from being a kludge, a procedure with some minor risk.  Of course I am
>>> taking a backup first, but still...
>>>
>>> This strikes me as something that should be a fairly common event --
>>> your raid is filling up, and so you expand it by replacing the oldest
>>> and smallest drive with a new much bigger one.   In the old days of
>>> RAID, you could not do that, you had to grow all drives at the same
>>> time, and this is one of the ways that BTRFS is quite superior.
>>> When I had MD raid, I went through a strange process of always having
>>> a raid 5 that consisted of different sized drives.  The raid-5 was
>>> based on the smallest of the 3 drives, and then the larger ones had
>>> extra space which could either be in raid-1, or more imply was in solo
>>> disk mode and used for less critical data (such as backups and old
>>> archives.)   Slowly, and in a messy way, each time I replaced the
>>> smallest drive, I could then grow the raid 5.  Yuck. BTRFS is so
>>> much better, except for this issue.
>>>
>>> So if somebody has a thought of a procedure that is fairly sure to
>>> work and doesn't involve too many copying passes -- copying 4tb is not
>>> a quick operation -- it is much appreciated and might be a good thing
>>> to add to a wiki page, which I would be happy to do.
>>
>> Anyway, "btrfs fi show" and "btrfs fi usage" would help before any
>> further advice from community.
>>
>> Thanks,
>> Qu
>>
>>>
>>> On Sat, May 26, 2018 at 6:56 PM, Qu Wenruo  wrote:


 On 2018年05月27日 09:49, Brad Templeton wrote:
> That is what did not work last time.
>
> I say I think there can be a "fix" because I hope the goal of BTRFS
> raid is to be superior to traditional RAID.   That if one replaces a
> drive, and asks to balance, it figures out what needs to be done to
> make that work.  I understand that the current balance algorithm may
> have trouble with that.   In this situation, the ideal result would be
>