[PATCH v6 3/8] btrfs-progs: dedupe: Add disable support for inband dedupelication

2016-03-22 Thread Qu Wenruo
Add disable subcommand for dedupe command group.

Signed-off-by: Qu Wenruo 
---
 Documentation/btrfs-dedupe.asciidoc |  5 +
 btrfs-completion|  2 +-
 cmds-dedupe.c   | 42 +
 3 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/Documentation/btrfs-dedupe.asciidoc 
b/Documentation/btrfs-dedupe.asciidoc
index 8ab40ab..28fe05f 100644
--- a/Documentation/btrfs-dedupe.asciidoc
+++ b/Documentation/btrfs-dedupe.asciidoc
@@ -21,6 +21,11 @@ use with caution.
 
 SUBCOMMAND
 --
+*disable* ::
+Disable in-band de-duplication for a filesystem.
++
+This will trash all stored dedupe hash.
++
 *enable* [options] ::
 Enable in-band de-duplication for a filesystem.
 +
diff --git a/btrfs-completion b/btrfs-completion
index 50f7ea2..9a6c73b 100644
--- a/btrfs-completion
+++ b/btrfs-completion
@@ -40,7 +40,7 @@ _btrfs()
 commands_property='get set list'
 commands_quota='enable disable rescan'
 commands_qgroup='assign remove create destroy show limit'
-commands_dedupe='enable'
+commands_dedupe='enable disable'
 commands_replace='start status cancel'
 
if [[ "$cur" == -* && $cword -le 3 && "$cmd" != "help" ]]; then
diff --git a/cmds-dedupe.c b/cmds-dedupe.c
index d9dcb10..64ac0f2 100644
--- a/cmds-dedupe.c
+++ b/cmds-dedupe.c
@@ -190,9 +190,51 @@ out:
return ret;
 }
 
+static const char * const cmd_dedupe_disable_usage[] = {
+   "btrfs dedupe disable ",
+   "Disable in-band(write time) de-duplication of a btrfs.",
+   NULL
+};
+
+static int cmd_dedupe_disable(int argc, char **argv)
+{
+   struct btrfs_ioctl_dedupe_args dargs;
+   DIR *dirstream;
+   char *path;
+   int fd;
+   int ret;
+
+   if (check_argc_exact(argc, 2))
+   usage(cmd_dedupe_disable_usage);
+
+   path = argv[1];
+   fd = open_file_or_dir(path, &dirstream);
+   if (fd < 0) {
+   error("failed to open file or directory: %s", path);
+   return 1;
+   }
+   memset(&dargs, 0, sizeof(dargs));
+   dargs.cmd = BTRFS_DEDUPE_CTL_DISABLE;
+
+   ret = ioctl(fd, BTRFS_IOC_DEDUPE_CTL, &dargs);
+   if (ret < 0) {
+   error("failed to disable inband deduplication: %s",
+ strerror(errno));
+   ret = 1;
+   goto out;
+   }
+   ret = 0;
+
+out:
+   close_file_or_dir(fd, dirstream);
+   return 0;
+}
+
 const struct cmd_group dedupe_cmd_group = {
dedupe_cmd_group_usage, dedupe_cmd_group_info, {
{ "enable", cmd_dedupe_enable, cmd_dedupe_enable_usage, NULL, 
0},
+   { "disable", cmd_dedupe_disable, cmd_dedupe_disable_usage,
+ NULL, 0},
NULL_CMD_STRUCT
}
 };
-- 
2.7.4



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 2/8] btrfs-progs: dedupe: Add enable command for dedupe command group

2016-03-22 Thread Qu Wenruo
Add enable subcommand for dedupe commmand group.

Signed-off-by: Qu Wenruo 
---
 Documentation/btrfs-dedupe.asciidoc | 105 +++-
 btrfs-completion|   6 +-
 cmds-dedupe.c   | 155 
 ioctl.h |   2 +
 4 files changed, 266 insertions(+), 2 deletions(-)

diff --git a/Documentation/btrfs-dedupe.asciidoc 
b/Documentation/btrfs-dedupe.asciidoc
index 5d63c32..8ab40ab 100644
--- a/Documentation/btrfs-dedupe.asciidoc
+++ b/Documentation/btrfs-dedupe.asciidoc
@@ -21,7 +21,110 @@ use with caution.
 
 SUBCOMMAND
 --
-Nothing yet
+*enable* [options] ::
+Enable in-band de-duplication for a filesystem.
++
+`Options`
++
+-s|--storage-backend 
+Specify de-duplication hash storage backend.
+Supported backends are 'ondisk' and 'inmemory'.
+If not specified, default value is 'inmemory'.
++
+Refer to *BACKENDS* sector for more information.
+
+-b|--blocksize 
+Specify dedupe block size.
+Supported values are power of 2 from '16K' to '8M'.
+Default value is '128K'.
++
+Refer to *BLOCKSIZE* sector for more information.
+
+-a|--hash-algorithm 
+Specify hash algorithm.
+Only 'sha256' is supported yet.
+
+-l|--limit-hash 
+Specify maximum number of hashes stored in memory.
+Only works for 'inmemory' backend.
+Conflicts with '-m' option.
++
+Only positive values are valid.
+Default value is '32K'.
+
+-m|--limit-memory 
+Specify maximum memory used for hashes.
+Only works for 'inmemory' backend.
+Conflicts with '-l' option.
++
+Only value larger than or equal to '1024' is valid.
+No default value.
++
+NOTE: Memory limit will be rounded down to kernel internal hash size,
+so the memory limit shown in 'btrfs dedupe status' may be different
+from the .
+
+WARNING: Too large value for '-l' or '-m' will easily trigger OOM.
+Please use with caution according to system memory or use 'ondisk' backend
+if memory usage is critical.
+
+BACKENDS
+
+Btrfs in-band de-duplication support two different backends with their own
+features.
+
+In-memory backend::
+This backend provides backward-compatibility, and more fine-tuning options.
+But hash pool is non-persistent and may exhaust kernel memory if not setup
+properly.
++
+This backend can be used on old btrfs(without '-O dedupe' mkfs option).
+When used on old btrfs, this backend needs to be enabled manually after mount.
++
+Designed for fast hash search speed, in-memory backend will keep all dedupe
+hashes in memory. (Although overall performance is still much the same with
+'ondisk' backend)
++
+And only keeps limited number of hash in memory to avoid exhausting memory.
+Hashes over the limit will be dropped following Last-Recent-Use behavior.
+So this backend has a consistent overhead for given limit but can\'t ensure
+any all duplicated blocks will be de-duplicated.
++
+After umount and mount, in-memory backend need to refill its hash pool.
+
+On-disk backend::
+This backend provides persistent hash pool, with more smart memory management
+for hash pool.
+But it\'s not backward-compatible, meaning it must be used with '-O dedupe' 
mkfs
+option and older kernel can\'t mount it read-write.
++
+Designed for de-duplication rate, hash pool is stored as B+ tree on disk.
+Although this behavior may cause extra disk IO for hash search under extreme
+high memory pressure,
+under most case the overall performance should be on par with 'inmemory'
+backend.
++
+After umount and mount, on-disk backend still has its hash on disk, no need to
+refill its dedupe hash pool.
+
+DEDUPE BLOCK SIZE
+
+In-band de-duplication is done at dedupe block size.
+Any data smaller than dedupe block size won\'t go through in-band
+de-duplication.
+
+And dedupe block size affects dedupe rate and fragmentation heavily.
+
+Smaller block size will cause more fragments, but higher dedupe rate.
+
+Larger block size will cause less fragments, but lower dedupe rate.
+
+In-band de-duplication rate is highly related to the workload pattern.
+So it\'s highly recommended to align dedupe block size to the workload
+block size to make full use of de-duplication.
+
+And dedupe block size larger than 128K will cause compression unavailable, as
+compression only support maximum extent size of 128K.
 
 EXIT STATUS
 ---
diff --git a/btrfs-completion b/btrfs-completion
index 3ede77b..50f7ea2 100644
--- a/btrfs-completion
+++ b/btrfs-completion
@@ -29,7 +29,7 @@ _btrfs()
 
local cmd=${words[1]}
 
-commands='subvolume filesystem balance device scrub check rescue restore 
inspect-internal property send receive quota qgroup replace help version'
+commands='subvolume filesystem balance device scrub check rescue restore 
inspect-internal property send receive quota qgroup dedupe replace help version'
 commands_subvolume='create delete list snapshot find-new get-default 
set-default show sync'
 commands_filesystem='defragment sync resize show df label u

[PATCH v6 1/8] btrfs-progs: Basic framework for dedupe command group

2016-03-22 Thread Qu Wenruo
Add basic ioctl header and command group framework for later use.
Alone with basic man page doc.

Signed-off-by: Qu Wenruo 
---
 Documentation/Makefile.in   |  1 +
 Documentation/btrfs-dedupe.asciidoc | 39 ++
 Documentation/btrfs.asciidoc|  4 
 Makefile.in |  3 ++-
 btrfs.c |  1 +
 cmds-dedupe.c   | 48 +
 commands.h  |  2 ++
 ctree.h | 40 ++-
 dedupe.h| 42 
 ioctl.h | 21 
 10 files changed, 199 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/btrfs-dedupe.asciidoc
 create mode 100644 cmds-dedupe.c
 create mode 100644 dedupe.h

diff --git a/Documentation/Makefile.in b/Documentation/Makefile.in
index aea2cb4..24fd35e 100644
--- a/Documentation/Makefile.in
+++ b/Documentation/Makefile.in
@@ -28,6 +28,7 @@ MAN8_TXT += btrfs-qgroup.asciidoc
 MAN8_TXT += btrfs-replace.asciidoc
 MAN8_TXT += btrfs-restore.asciidoc
 MAN8_TXT += btrfs-property.asciidoc
+MAN8_TXT += btrfs-dedupe.asciidoc
 
 # Category 5 manual page
 MAN5_TXT += btrfs-man5.asciidoc
diff --git a/Documentation/btrfs-dedupe.asciidoc 
b/Documentation/btrfs-dedupe.asciidoc
new file mode 100644
index 000..5d63c32
--- /dev/null
+++ b/Documentation/btrfs-dedupe.asciidoc
@@ -0,0 +1,39 @@
+btrfs-dedupe(8)
+==
+
+NAME
+
+btrfs-dedupe - manage in-band (write time) de-duplication of a btrfs filesystem
+
+SYNOPSIS
+
+*btrfs dedupe*  
+
+DESCRIPTION
+---
+*btrfs dedupe* is used to enable/disable or show current in-band de-duplication
+status of a btrfs filesystem.
+
+Kernel support for in-band de-duplication starts from 4.6.
+
+WARNING: In-band de-duplication is still an experimental feautre of btrfs,
+use with caution.
+
+SUBCOMMAND
+--
+Nothing yet
+
+EXIT STATUS
+---
+*btrfs dedupe* returns a zero exit status if it succeeds. Non zero is
+returned in case of failure.
+
+AVAILABILITY
+
+*btrfs* is part of btrfs-progs.
+Please refer to the btrfs wiki http://btrfs.wiki.kernel.org for
+further details.
+
+SEE ALSO
+
+`mkfs.btrfs`(8),
diff --git a/Documentation/btrfs.asciidoc b/Documentation/btrfs.asciidoc
index 6a77a85..8ded842 100644
--- a/Documentation/btrfs.asciidoc
+++ b/Documentation/btrfs.asciidoc
@@ -43,6 +43,10 @@ COMMANDS
Do off-line check on a btrfs filesystem. +
See `btrfs-check`(8) for details.
 
+*dedupe*::
+   Control btrfs in-band(write time) de-duplication. +
+   See `btrfs-dedupe`(8) for details.
+
 *device*::
Manage devices managed by btrfs, including add/delete/scan and so
on. +
diff --git a/Makefile.in b/Makefile.in
index 71ef76d..0b6d7de 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -76,7 +76,8 @@ cmds_objects = cmds-subvolume.o cmds-filesystem.o 
cmds-device.o cmds-scrub.o \
   cmds-quota.o cmds-qgroup.o cmds-replace.o cmds-check.o \
   cmds-restore.o cmds-rescue.o chunk-recover.o super-recover.o \
   cmds-property.o cmds-fi-usage.o cmds-inspect-dump-tree.o \
-  cmds-inspect-dump-super.o cmds-inspect-tree-stats.o cmds-fi-du.o
+  cmds-inspect-dump-super.o cmds-inspect-tree-stats.o cmds-fi-du.o 
\
+  cmds-dedupe.o
 libbtrfs_objects = send-stream.o send-utils.o rbtree.o btrfs-list.o crc32c.o \
   uuid-tree.o utils-lib.o rbtree-utils.o
 libbtrfs_headers = send-stream.h send-utils.h send.h rbtree.h btrfs-list.h \
diff --git a/btrfs.c b/btrfs.c
index cc70515..c0c8f27 100644
--- a/btrfs.c
+++ b/btrfs.c
@@ -199,6 +199,7 @@ static const struct cmd_group btrfs_cmd_group = {
{ "receive", cmd_receive, cmd_receive_usage, NULL, 0 },
{ "quota", cmd_quota, NULL, "a_cmd_group, 0 },
{ "qgroup", cmd_qgroup, NULL, &qgroup_cmd_group, 0 },
+   { "dedupe", cmd_dedupe, NULL, &dedupe_cmd_group, 0 },
{ "replace", cmd_replace, NULL, &replace_cmd_group, 0 },
{ "help", cmd_help, cmd_help_usage, NULL, 0 },
{ "version", cmd_version, cmd_version_usage, NULL, 0 },
diff --git a/cmds-dedupe.c b/cmds-dedupe.c
new file mode 100644
index 000..b25b8db
--- /dev/null
+++ b/cmds-dedupe.c
@@ -0,0 +1,48 @@
+/*
+ * Copyright (C) 2015 Fujitsu.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy 

[PATCH v6 5/8] btrfs-progs: Add dedupe feature for mkfs and convert

2016-03-22 Thread Qu Wenruo
Add new DEDUPE ro compat flag and corresponding mkfs/convert flag
'dedupe'.

Since dedupe tree is completely isolated from fs tree, so even old kernel
could do read mount.
So add it to RO compat flag instead of common incompat flags

Signed-off-by: Qu Wenruo 
---
 Documentation/mkfs.btrfs.asciidoc |  9 
 btrfs-convert.c   | 19 +++-
 mkfs.c|  8 +--
 utils.c   | 47 +--
 utils.h   |  7 +++---
 5 files changed, 67 insertions(+), 23 deletions(-)

diff --git a/Documentation/mkfs.btrfs.asciidoc 
b/Documentation/mkfs.btrfs.asciidoc
index 0c43a79..b27e357 100644
--- a/Documentation/mkfs.btrfs.asciidoc
+++ b/Documentation/mkfs.btrfs.asciidoc
@@ -208,6 +208,15 @@ reduced-size metadata for extent references, saves a few 
percent of metadata
 improved representation of file extents where holes are not explicitly
 stored as an extent, saves a few percent of metadata if sparse files are used
 
+*dedupe*::
+allow btrfs to use new on-disk format designed for in-band(write time)
+de-duplication.
++
+on-disk storage backend and persist de-duplication status needs this feature.
++
+this feature is RO compat feature, means old kernel can still mount it
+read-only.
+
 BLOCK GROUPS, CHUNKS, RAID
 --
 
diff --git a/btrfs-convert.c b/btrfs-convert.c
index 1768e4e..5224652 100644
--- a/btrfs-convert.c
+++ b/btrfs-convert.c
@@ -2453,7 +2453,7 @@ static int convert_open_fs(const char *devname,
 
 static int do_convert(const char *devname, int datacsum, int packing, int 
noxattr,
u32 nodesize, int copylabel, const char *fslabel, int progress,
-   u64 features)
+   u64 features, u64 ro_features)
 {
int i, ret, blocks_per_node;
int fd = -1;
@@ -2504,8 +2504,9 @@ static int do_convert(const char *devname, int datacsum, 
int packing, int noxatt
fprintf(stderr, "unable to open %s\n", devname);
goto fail;
}
-   btrfs_parse_features_to_string(features_buf, features);
-   if (features == BTRFS_MKFS_DEFAULT_FEATURES)
+   btrfs_parse_features_to_string(features_buf, features, ro_features);
+   if (features == BTRFS_MKFS_DEFAULT_FEATURES &&
+   ro_features == 0)
strcat(features_buf, " (default)");
 
printf("create btrfs filesystem:\n");
@@ -2521,6 +2522,7 @@ static int do_convert(const char *devname, int datacsum, 
int packing, int noxatt
mkfs_cfg.sectorsize = blocksize;
mkfs_cfg.stripesize = blocksize;
mkfs_cfg.features = features;
+   mkfs_cfg.ro_features = ro_features;
 
ret = make_btrfs(fd, &mkfs_cfg);
if (ret) {
@@ -3071,6 +3073,7 @@ int main(int argc, char *argv[])
char *file;
char fslabel[BTRFS_LABEL_SIZE];
u64 features = BTRFS_MKFS_DEFAULT_FEATURES;
+   u64 ro_features = 0;
 
while(1) {
enum { GETOPT_VAL_NO_PROGRESS = 256 };
@@ -3129,7 +3132,8 @@ int main(int argc, char *argv[])
char *orig = strdup(optarg);
char *tmp = orig;
 
-   tmp = btrfs_parse_fs_features(tmp, &features);
+   tmp = btrfs_parse_fs_features(tmp, &features,
+ &ro_features);
if (tmp) {
fprintf(stderr,
"Unrecognized filesystem 
feature '%s'\n",
@@ -3147,7 +3151,9 @@ int main(int argc, char *argv[])
char buf[64];
 
btrfs_parse_features_to_string(buf,
-   features & 
~BTRFS_CONVERT_ALLOWED_FEATURES);
+   features &
+   ~BTRFS_CONVERT_ALLOWED_FEATURES,
+   ro_features);
fprintf(stderr,
"ERROR: features not allowed 
for convert: %s\n",
buf);
@@ -3197,7 +3203,8 @@ int main(int argc, char *argv[])
ret = do_rollback(file);
} else {
ret = do_convert(file, datacsum, packing, noxattr, nodesize,
-   copylabel, fslabel, progress, features);
+   copylabel, fslabel, progress, features,
+   ro_features);
}
if (ret)
return 1;
diff --git a/mkfs.c b/mkfs.c
index 5e79e0b..5071060 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -1369,6 +1369,7 @@ int main(int argc, char **argv)
int saved_optind;
char fs_uuid[BTRFS_UUID_UNPARSED_SIZE] = { 0 };

[PATCH v6 6/8] btrfs-progs: Add show-super support for new DEDUPE flag

2016-03-22 Thread Qu Wenruo
Now btrfs-show-super can handle DEDUPE ro compat flag.

Signed-off-by: Qu Wenruo 
---
 cmds-inspect-dump-super.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/cmds-inspect-dump-super.c b/cmds-inspect-dump-super.c
index 3e09ee8..6a939c9 100644
--- a/cmds-inspect-dump-super.c
+++ b/cmds-inspect-dump-super.c
@@ -198,6 +198,16 @@ struct readable_flag_entry {
char *output;
 };
 
+#define DEF_RO_COMPAT_FLAG_ENTRY(bit_name) \
+   {BTRFS_FEATURE_COMPAT_RO_##bit_name, #bit_name}
+
+struct readable_flag_entry ro_compat_flags_array[] = {
+   DEF_RO_COMPAT_FLAG_ENTRY(DEDUPE)
+};
+
+static const int ro_compat_flags_num = sizeof(ro_compat_flags_array) /
+ sizeof(struct readable_flag_entry);
+
 #define DEF_INCOMPAT_FLAG_ENTRY(bit_name)  \
{BTRFS_FEATURE_INCOMPAT_##bit_name, #bit_name}
 
@@ -269,6 +279,13 @@ static void __print_readable_flag(u64 flag, struct 
readable_flag_entry *array,
printf(")\n");
 }
 
+static void print_readable_ro_compat_flag(u64 ro_flag)
+{
+   return __print_readable_flag(ro_flag, ro_compat_flags_array,
+ro_compat_flags_num,
+BTRFS_FEATURE_COMPAT_RO_SUPP);
+}
+
 static void print_readable_incompat_flag(u64 flag)
 {
return __print_readable_flag(flag, incompat_flags_array,
@@ -360,6 +377,7 @@ static void dump_superblock(struct btrfs_super_block *sb, 
int full)
   (unsigned long long)btrfs_super_compat_flags(sb));
printf("compat_ro_flags\t\t0x%llx\n",
   (unsigned long long)btrfs_super_compat_ro_flags(sb));
+   print_readable_ro_compat_flag(btrfs_super_compat_ro_flags(sb));
printf("incompat_flags\t\t0x%llx\n",
   (unsigned long long)btrfs_super_incompat_flags(sb));
print_readable_incompat_flag(btrfs_super_incompat_flags(sb));
-- 
2.7.4



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 4/8] btrfs-progs: dedupe: Add status subcommand

2016-03-22 Thread Qu Wenruo
Add status subcommand for dedupe command group.

Signed-off-by: Qu Wenruo 
---
 Documentation/btrfs-dedupe.asciidoc |  3 ++
 btrfs-completion|  2 +-
 cmds-dedupe.c   | 84 +
 3 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/Documentation/btrfs-dedupe.asciidoc 
b/Documentation/btrfs-dedupe.asciidoc
index 28fe05f..5a5bf52 100644
--- a/Documentation/btrfs-dedupe.asciidoc
+++ b/Documentation/btrfs-dedupe.asciidoc
@@ -73,6 +73,9 @@ WARNING: Too large value for '-l' or '-m' will easily trigger 
OOM.
 Please use with caution according to system memory or use 'ondisk' backend
 if memory usage is critical.
 
+*status* ::
+Show current in-band de-duplication status of a filesystem.
+
 BACKENDS
 
 Btrfs in-band de-duplication support two different backends with their own
diff --git a/btrfs-completion b/btrfs-completion
index 9a6c73b..fbaae0c 100644
--- a/btrfs-completion
+++ b/btrfs-completion
@@ -40,7 +40,7 @@ _btrfs()
 commands_property='get set list'
 commands_quota='enable disable rescan'
 commands_qgroup='assign remove create destroy show limit'
-commands_dedupe='enable disable'
+commands_dedupe='enable disable status'
 commands_replace='start status cancel'
 
if [[ "$cur" == -* && $cword -le 3 && "$cmd" != "help" ]]; then
diff --git a/cmds-dedupe.c b/cmds-dedupe.c
index 64ac0f2..8005b6e 100644
--- a/cmds-dedupe.c
+++ b/cmds-dedupe.c
@@ -230,11 +230,95 @@ out:
return 0;
 }
 
+static const char * const cmd_dedupe_status_usage[] = {
+   "btrfs dedupe status ",
+   "Show current in-band(write time) de-duplication status of a btrfs.",
+   NULL
+};
+
+static int cmd_dedupe_status(int argc, char **argv)
+{
+   struct btrfs_ioctl_dedupe_args dargs;
+   DIR *dirstream;
+   char *path;
+   int fd;
+   int ret;
+   int print_limit = 1;
+
+   if (check_argc_exact(argc, 2))
+   usage(cmd_dedupe_status_usage);
+
+   path = argv[1];
+   fd = open_file_or_dir(path, &dirstream);
+   if (fd < 0) {
+   error("failed to open file or directory: %s", path);
+   ret = 1;
+   goto out;
+   }
+   memset(&dargs, 0, sizeof(dargs));
+   dargs.cmd = BTRFS_DEDUPE_CTL_STATUS;
+
+   ret = ioctl(fd, BTRFS_IOC_DEDUPE_CTL, &dargs);
+   if (ret < 0) {
+   error("failed to get inband deduplication status: %s",
+ strerror(errno));
+   ret = 1;
+   goto out;
+   }
+   ret = 0;
+   if (dargs.status == 0) {
+   printf("Status: \t\t\tDisabled\n");
+   goto out;
+   }
+   printf("Status:\t\t\tEnabled\n");
+
+   if (dargs.hash_type == BTRFS_DEDUPE_HASH_SHA256)
+   printf("Hash algorithm:\t\tSHA-256\n");
+   else
+   printf("Hash algorithm:\t\tUnrecognized(%x)\n",
+   dargs.hash_type);
+
+   if (dargs.backend == BTRFS_DEDUPE_BACKEND_INMEMORY) {
+   printf("Backend:\t\tIn-memory\n");
+   print_limit = 1;
+   } else if (dargs.backend == BTRFS_DEDUPE_BACKEND_ONDISK) {
+   printf("Backend:\t\tOn-disk\n");
+   print_limit = 0;
+   } else  {
+   printf("Backend:\t\tUnrecognized(%x)\n",
+   dargs.backend);
+   }
+
+   printf("Dedup Blocksize:\t%llu\n", dargs.blocksize);
+
+   if (print_limit) {
+   u64 cur_mem;
+
+   /* Limit nr may be 0 */
+   if (dargs.limit_nr)
+   cur_mem = dargs.current_nr * (dargs.limit_mem /
+   dargs.limit_nr);
+   else
+   cur_mem = 0;
+
+   printf("Number of hash: \t[%llu/%llu]\n", dargs.current_nr,
+   dargs.limit_nr);
+   printf("Memory usage: \t\t[%s/%s]\n",
+   pretty_size(cur_mem),
+   pretty_size(dargs.limit_mem));
+   }
+out:
+   close_file_or_dir(fd, dirstream);
+   return ret;
+}
+
 const struct cmd_group dedupe_cmd_group = {
dedupe_cmd_group_usage, dedupe_cmd_group_info, {
{ "enable", cmd_dedupe_enable, cmd_dedupe_enable_usage, NULL, 
0},
{ "disable", cmd_dedupe_disable, cmd_dedupe_disable_usage,
  NULL, 0},
+   { "status", cmd_dedupe_status, cmd_dedupe_status_usage,
+ NULL, 0},
NULL_CMD_STRUCT
}
 };
-- 
2.7.4



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 8/8] btrfs-progs: property: add a dedupe property

2016-03-22 Thread Qu Wenruo
From: Wang Xiaoguang 

Normally if we enable online dedupe for a fs, it's filesystem wide
de-duplication. With this property, we can explicitly disable data
de-duplication for specified files.

Signed-off-by: Wang Xiaoguang 
---
 Documentation/btrfs-property.asciidoc |  2 +
 props.c   | 73 +++
 2 files changed, 75 insertions(+)

diff --git a/Documentation/btrfs-property.asciidoc 
b/Documentation/btrfs-property.asciidoc
index 8b9b7f0..ca90035 100644
--- a/Documentation/btrfs-property.asciidoc
+++ b/Documentation/btrfs-property.asciidoc
@@ -44,6 +44,8 @@ label
 label of device
 compression
 compression setting for an inode: lzo, zlib, or "" (empty string)
+dedupe
+online dedupe setting for an inode: disable or "" (empty string)
 
 *list* [-t ] ::
 Lists available properties with their descriptions for the given object.
diff --git a/props.c b/props.c
index 5b74932..d8f6925 100644
--- a/props.c
+++ b/props.c
@@ -187,6 +187,77 @@ out:
return ret;
 }
 
+static int prop_dedupe(enum prop_object_type type, const char *object,
+   const char *name, const char *value)
+{
+   int ret;
+   ssize_t sret;
+   int fd = -1;
+   DIR *dirstream = NULL;
+   char *buf = NULL;
+   char *xattr_name = NULL;
+   int open_flags = value ? O_RDWR : O_RDONLY;
+
+   fd = open_file_or_dir3(object, &dirstream, open_flags);
+   if (fd == -1) {
+   ret = -errno;
+   fprintf(stderr, "ERROR: open %s failed. %s\n",
+   object, strerror(-ret));
+   goto out;
+   }
+
+   xattr_name = malloc(XATTR_BTRFS_PREFIX_LEN + strlen(name) + 1);
+   if (!xattr_name) {
+   ret = -ENOMEM;
+   goto out;
+   }
+   memcpy(xattr_name, XATTR_BTRFS_PREFIX, XATTR_BTRFS_PREFIX_LEN);
+   memcpy(xattr_name + XATTR_BTRFS_PREFIX_LEN, name, strlen(name));
+   xattr_name[XATTR_BTRFS_PREFIX_LEN + strlen(name)] = '\0';
+
+   if (value)
+   sret = fsetxattr(fd, xattr_name, value, strlen(value), 0);
+   else
+   sret = fgetxattr(fd, xattr_name, NULL, 0);
+   if (sret < 0) {
+   ret = -errno;
+   if (ret != -ENOATTR)
+   fprintf(stderr,
+   "ERROR: failed to %s dedupe for %s. %s\n",
+   value ? "set" : "get", object, strerror(-ret));
+   else
+   ret = 0;
+   goto out;
+   }
+   if (!value) {
+   size_t len = sret;
+
+   buf = malloc(len);
+   if (!buf) {
+   ret = -ENOMEM;
+   goto out;
+   }
+   sret = fgetxattr(fd, xattr_name, buf, len);
+   if (sret < 0) {
+   ret = -errno;
+   fprintf(stderr,
+   "ERROR: failed to get dedupe for %s. %s\n",
+   object, strerror(-ret));
+   goto out;
+   }
+   fprintf(stdout, "dedupe=%.*s\n", (int)len, buf);
+   }
+
+   ret = 0;
+out:
+   free(xattr_name);
+   free(buf);
+   if (fd >= 0)
+   close_file_or_dir(fd, dirstream);
+
+   return ret;
+}
+
 const struct prop_handler prop_handlers[] = {
{"ro", "Set/get read-only flag of subvolume.", 0, prop_object_subvol,
 prop_read_only},
@@ -194,5 +265,7 @@ const struct prop_handler prop_handlers[] = {
 prop_object_dev | prop_object_root, prop_label},
{"compression", "Set/get compression for a file or directory", 0,
 prop_object_inode, prop_compression},
+   {"dedupe", "Set/get dedupe for a file or directory", 0,
+prop_object_inode, prop_dedupe},
{NULL, NULL, 0, 0, NULL}
 };
-- 
2.7.4



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 7/8] btrfs-progs: debug-tree: Add dedupe tree support

2016-03-22 Thread Qu Wenruo
Add dedupe tree support for btrfs-debug-tree.

Signed-off-by: Qu Wenruo 
---
 cmds-inspect-dump-tree.c |   4 ++
 ctree.h  |   7 
 print-tree.c | 105 +++
 3 files changed, 116 insertions(+)

diff --git a/cmds-inspect-dump-tree.c b/cmds-inspect-dump-tree.c
index 43c8b67..0c75a3c 100644
--- a/cmds-inspect-dump-tree.c
+++ b/cmds-inspect-dump-tree.c
@@ -496,6 +496,10 @@ again:
printf("multiple");
}
break;
+   case BTRFS_DEDUPE_TREE_OBJECTID:
+   if (!skip)
+   printf("dedupe");
+   break;
default:
if (!skip) {
printf("file");
diff --git a/ctree.h b/ctree.h
index d251f65..45acd52 100644
--- a/ctree.h
+++ b/ctree.h
@@ -79,6 +79,9 @@ struct btrfs_free_space_ctl;
 /* tracks free space in block groups. */
 #define BTRFS_FREE_SPACE_TREE_OBJECTID 10ULL
 
+/* on-disk dedupe tree (EXPERIMENTAL) */
+#define BTRFS_DEDUPE_TREE_OBJECTID 11ULL
+
 /* for storing balance parameters in the root tree */
 #define BTRFS_BALANCE_OBJECTID -4ULL
 
@@ -1219,6 +1222,10 @@ struct btrfs_root {
 #define BTRFS_DEV_ITEM_KEY 216
 #define BTRFS_CHUNK_ITEM_KEY   228
 
+#define BTRFS_DEDUPE_STATUS_ITEM_KEY   230
+#define BTRFS_DEDUPE_HASH_ITEM_KEY 231
+#define BTRFS_DEDUPE_BYTENR_ITEM_KEY   232
+
 #define BTRFS_BALANCE_ITEM_KEY 248
 
 /*
diff --git a/print-tree.c b/print-tree.c
index d0f37a5..72b9ed6 100644
--- a/print-tree.c
+++ b/print-tree.c
@@ -25,6 +25,7 @@
 #include "disk-io.h"
 #include "print-tree.h"
 #include "utils.h"
+#include "dedupe.h"
 
 
 static void print_dir_item_type(struct extent_buffer *eb,
@@ -687,11 +688,31 @@ static void print_key_type(u64 objectid, u8 type)
case BTRFS_UUID_KEY_RECEIVED_SUBVOL:
printf("UUID_KEY_RECEIVED_SUBVOL");
break;
+   case BTRFS_DEDUPE_STATUS_ITEM_KEY:
+   printf("DEDUPE_STATUS_ITEM");
+   break;
+   case BTRFS_DEDUPE_HASH_ITEM_KEY:
+   printf("DEDUPE_HASH_ITEM");
+   break;
+   case BTRFS_DEDUPE_BYTENR_ITEM_KEY:
+   printf("DEDUPE_BYTENR_ITEM");
+   break;
default:
printf("UNKNOWN.%d", type);
};
 }
 
+static void print_64bit_hash(u64 hash)
+{
+   int i;
+   unsigned char buf[8];
+
+   memcpy(buf, &hash, 8);
+   printf("0x");
+   for (i = 0; i < 8; i++)
+   printf("%02x", buf[i]);
+}
+
 static void print_objectid(u64 objectid, u8 type)
 {
switch (type) {
@@ -706,6 +727,9 @@ static void print_objectid(u64 objectid, u8 type)
case BTRFS_UUID_KEY_RECEIVED_SUBVOL:
printf("0x%016llx", (unsigned long long)objectid);
return;
+   case BTRFS_DEDUPE_HASH_ITEM_KEY:
+   print_64bit_hash(objectid);
+   return;
}
 
switch (objectid) {
@@ -772,6 +796,9 @@ static void print_objectid(u64 objectid, u8 type)
case BTRFS_MULTIPLE_OBJECTIDS:
printf("MULTIPLE");
break;
+   case BTRFS_DEDUPE_TREE_OBJECTID:
+   printf("DEDUPE_TREE");
+   break;
case (u64)-1:
printf("-1");
break;
@@ -807,6 +834,9 @@ void btrfs_print_key(struct btrfs_disk_key *disk_key)
case BTRFS_UUID_KEY_RECEIVED_SUBVOL:
printf(" 0x%016llx)", (unsigned long long)offset);
break;
+   case BTRFS_DEDUPE_BYTENR_ITEM_KEY:
+   print_64bit_hash(offset);
+   break;
default:
if (offset == (u64)-1)
printf(" -1)");
@@ -835,6 +865,71 @@ static void print_uuid_item(struct extent_buffer *l, 
unsigned long offset,
}
 }
 
+static void print_dedupe_status(struct extent_buffer *node, int slot)
+{
+   struct btrfs_dedupe_status_item *status_item;
+   u64 blocksize;
+   u64 limit;
+   u16 hash_type;
+   u16 backend;
+
+   status_item = btrfs_item_ptr(node, slot,
+   struct btrfs_dedupe_status_item);
+   blocksize = btrfs_dedupe_status_blocksize(node, status_item);
+   limit = btrfs_dedupe_status_limit(node, status_item);
+   hash_type = btrfs_dedupe_status_hash_type(node, status_item);
+   backend = btrfs_dedupe_status_backend(node, status_item);
+
+   printf("\t\tdedupe status item ");
+   if (backend == BTRFS_DEDUPE_BACKEND_INMEMORY)
+   printf("backend: inmemory\n");
+   else if (backend == BTRFS_DEDUPE_BACKEND_ONDISK)
+   printf("backend: ondisk\n");
+   else
+   printf("backend: Unrecognized(%u)\n", backend);
+
+   if (hash_type == BTRFS_DEDUPE_HASH_SHA256)
+   

csum errors in VirtualBox VDI files

2016-03-22 Thread Kai Krakow
Hello!

Since one of the last kernel updates (I don't know which exactly), I'm
experiencing csum errors within VDI files when running VirtualBox. A
side effect of this is, as soon as dmesg shows these errors, commands
like "du" and "df" hang until reboot.

I've now restored the file from backup but it happens over and over
again.

On another machine I'm also seeing errors with big files in the
following scenario (apparently an older kernel, 4.1.x I afair):

# ntfsclone --save /dev/md126p2 -o rescue.ntfs.img
   ^ big NTFS partition   ^ file on btrfs

results in a write error and the file system goes read-only.

Both systems have in common they are using btrfs on bcache with
compress=lzo,autodefrag,nossd,discard (mraid=1,draid=0 and
mraid=1,draid=single).

The system mentioned first is running Kernel 4.5.0 with Gentoo
patch-set. I upgraded from the last 4.4.x kernel when I first
experienced this problem. The first time the problem resulted in a
duplicate extent which btrfsck wasn't able to fix, that's when I first
restored from backup. But now I'm getting csum errors in this file over
a over again, plus when rsync has run for backup, the system no longer
responds to "du" and "df" commands - it just hangs.

Known problem? Does it help if I send debug info? If so, please
instruct.

-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: csum errors in VirtualBox VDI files

2016-03-22 Thread Kai Krakow
Am Tue, 22 Mar 2016 09:03:42 +0100
schrieb Kai Krakow :

> Hello!
> 
> Since one of the last kernel updates (I don't know which exactly), I'm
> experiencing csum errors within VDI files when running VirtualBox. A
> side effect of this is, as soon as dmesg shows these errors, commands
> like "du" and "df" hang until reboot.
> 
> I've now restored the file from backup but it happens over and over
> again.
> 
> On another machine I'm also seeing errors with big files in the
> following scenario (apparently an older kernel, 4.1.x I afair):
> 
> # ntfsclone --save /dev/md126p2 -o rescue.ntfs.img
>^ big NTFS partition   ^ file on btrfs
> 
> results in a write error and the file system goes read-only.
> 
> Both systems have in common they are using btrfs on bcache with
> compress=lzo,autodefrag,nossd,discard (mraid=1,draid=0 and
> mraid=1,draid=single).
> 
> The system mentioned first is running Kernel 4.5.0 with Gentoo
> patch-set. I upgraded from the last 4.4.x kernel when I first
> experienced this problem. The first time the problem resulted in a
> duplicate extent which btrfsck wasn't able to fix, that's when I first
> restored from backup. But now I'm getting csum errors in this file
> over a over again, plus when rsync has run for backup, the system no
> longer responds to "du" and "df" commands - it just hangs.
> 
> Known problem? Does it help if I send debug info? If so, please
> instruct.

Interestingly, ddrescue just skips over these csum errors without
counting an error...


-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: csum errors in VirtualBox VDI files

2016-03-22 Thread Qu Wenruo

Hi,

Kai Krakow wrote on 2016/03/22 09:03 +0100:

Hello!

Since one of the last kernel updates (I don't know which exactly), I'm
experiencing csum errors within VDI files when running VirtualBox. A
side effect of this is, as soon as dmesg shows these errors, commands
like "du" and "df" hang until reboot.

I've now restored the file from backup but it happens over and over
again.

On another machine I'm also seeing errors with big files in the
following scenario (apparently an older kernel, 4.1.x I afair):

# ntfsclone --save /dev/md126p2 -o rescue.ntfs.img
^ big NTFS partition   ^ file on btrfs

results in a write error and the file system goes read-only.


When it goes RO, it must have some warning in kernel log.
Would you please paste the kernel log?



Both systems have in common they are using btrfs on bcache with
compress=lzo,autodefrag,nossd,discard (mraid=1,draid=0 and
mraid=1,draid=single).

The system mentioned first is running Kernel 4.5.0 with Gentoo
patch-set. I upgraded from the last 4.4.x kernel when I first
experienced this problem. The first time the problem resulted in a
duplicate extent which btrfsck wasn't able to fix, that's when I first
restored from backup. But now I'm getting csum errors in this file over
a over again, plus when rsync has run for backup, the system no longer
responds to "du" and "df" commands - it just hangs.

Known problem? Does it help if I send debug info? If so, please
instruct.


Does btrfs check report anything wrong?

Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: csum errors in VirtualBox VDI files

2016-03-22 Thread Kai Krakow
Am Tue, 22 Mar 2016 09:03:42 +0100
schrieb Kai Krakow :

> Hello!
> 
> Since one of the last kernel updates (I don't know which exactly), I'm
> experiencing csum errors within VDI files when running VirtualBox. A
> side effect of this is, as soon as dmesg shows these errors, commands
> like "du" and "df" hang until reboot.
> 
> I've now restored the file from backup but it happens over and over
> again.
> 
> On another machine I'm also seeing errors with big files in the
> following scenario (apparently an older kernel, 4.1.x I afair):
> 
> # ntfsclone --save /dev/md126p2 -o rescue.ntfs.img
>^ big NTFS partition   ^ file on btrfs
> 
> results in a write error and the file system goes read-only.
> 
> Both systems have in common they are using btrfs on bcache with
> compress=lzo,autodefrag,nossd,discard (mraid=1,draid=0 and
> mraid=1,draid=single).
> 
> The system mentioned first is running Kernel 4.5.0 with Gentoo
> patch-set. I upgraded from the last 4.4.x kernel when I first
> experienced this problem. The first time the problem resulted in a
> duplicate extent which btrfsck wasn't able to fix, that's when I first
> restored from backup. But now I'm getting csum errors in this file
> over a over again, plus when rsync has run for backup, the system no
> longer responds to "du" and "df" commands - it just hangs.
> 
> Known problem? Does it help if I send debug info? If so, please
> instruct.
> 

[ 3073.426785] BTRFS info (device bcache2): no csum found for inode 7528856 
start 79614263296
[ 3073.447613] BTRFS info (device bcache2): csum failed ino 7528856 extent 
128363245568 csum 3730946112 wanted 0 mirror 0
[ 3073.554746] BTRFS info (device bcache2): no csum found for inode 7528856 
start 79614263296
[ 3073.590864] BTRFS info (device bcache2): csum failed ino 7528856 extent 
128363245568 csum 3730946112 wanted 0 mirror 0
[ 3073.590930] BTRFS info (device bcache2): no csum found for inode 7528856 
start 79614263296
[ 3073.617864] BTRFS info (device bcache2): csum failed ino 7528856 extent 
128363245568 csum 3730946112 wanted 0 mirror 0
[ 3073.618057] BTRFS info (device bcache2): no csum found for inode 7528856 
start 79614263296
[ 3073.644597] BTRFS info (device bcache2): csum failed ino 7528856 extent 
412938412032 csum 3730946112 wanted 0 mirror 0


-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix invalid reference in replace_path

2016-03-22 Thread David Sterba
On Mon, Mar 21, 2016 at 02:59:53PM -0700, Liu Bo wrote:
> Dan Carpenter's static checker has found this error, it's introduced by
> commit 64c043de466d
> ("Btrfs: fix up read_tree_block to return proper error")
> 
> It's really supposed to 'break' the loop on error like others.
> 
> Cc: Dan Carpenter 
> Reported-by: Dan Carpenter  
> Signed-off-by: Liu Bo 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: fix build warning

2016-03-22 Thread Geert Uytterhoeven
On Tue, Feb 16, 2016 at 9:02 AM, Sudip Mukherjee
 wrote:
> We were getting build warning about:
> fs/btrfs/extent-tree.c:7021:34: warning: ‘used_bg’ may be used
> uninitialized in this function
>
> It is not a valid warning as used_bg is never used uninitilized since
> locked is initially false so we can never be in the section where
> 'used_bg' is used. But gcc is not able to understand that and we can
> initialize it while declaring to silence the warning.
>
> Signed-off-by: Sudip Mukherjee 

FWIW, I've posted an alternative patch that killed the silly locked variable
a while ago.
"[PATCH] Btrfs: Refactor btrfs_lock_cluster() to kill compiler warning"
https://lkml.org/lkml/2014/6/22/96

> ---
>  fs/btrfs/extent-tree.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index e2287c7..f24e4c3 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -7018,7 +7018,7 @@ btrfs_lock_cluster(struct btrfs_block_group_cache 
> *block_group,
>struct btrfs_free_cluster *cluster,
>int delalloc)
>  {
> -   struct btrfs_block_group_cache *used_bg;
> +   struct btrfs_block_group_cache *used_bg = NULL;
> bool locked = false;
>  again:
> spin_lock(&cluster->refill_lock);
> --
> 1.9.1

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: fix build warning

2016-03-22 Thread David Sterba
On Tue, Mar 22, 2016 at 10:39:59AM +0100, Geert Uytterhoeven wrote:
> On Tue, Feb 16, 2016 at 9:02 AM, Sudip Mukherjee
>  wrote:
> > We were getting build warning about:
> > fs/btrfs/extent-tree.c:7021:34: warning: ‘used_bg’ may be used
> > uninitialized in this function
> >
> > It is not a valid warning as used_bg is never used uninitilized since
> > locked is initially false so we can never be in the section where
> > 'used_bg' is used. But gcc is not able to understand that and we can
> > initialize it while declaring to silence the warning.
> >
> > Signed-off-by: Sudip Mukherjee 
> 
> FWIW, I've posted an alternative patch that killed the silly locked variable
> a while ago.
> "[PATCH] Btrfs: Refactor btrfs_lock_cluster() to kill compiler warning"
> https://lkml.org/lkml/2014/6/22/96

The cleanup looks great, thanks, patch picked.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL] Btrfs, misc for 4.6

2016-03-22 Thread David Sterba
On Mon, Mar 21, 2016 at 07:42:09PM +0100, David Sterba wrote:
> Hi,
> 
> a few low-risk patches that appeared after I sent my main pull for 4.6. I 
> think
> they still apply to 4.6, the freezer patches help the livepatch to clean up 
> the
> mis-use of the freezer API, the rest makes sure we don't miss an error during
> writing checksums.
> 
> I've removed Liu Bo's patch fixing the writeback_index, IMO needs moe review
> and can be merged in some later rc.
> 
> 
> The following changes since commit 5e33a2bd7ca7fa687fb0965869196eea6815d1f3:
>   Btrfs: do not collect ordered extents when logging that inode exists 
> (2016-03-01 08:23:47 -0800)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-chris-4.6
> 
> for you to fetch changes up to 45147cbef21305b512a5b263e9e52f832a0df7b2:
> 
>   btrfs: transaction_kthread() is not freezable (2016-03-21 19:02:34 +0100)

FYI, I've added Filipe's reviewed-by, the top commit is now
ce63f891e1a87ae79c4325dad5f512e8d6a8a78e
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: utils: make sure set_label_mounted uses correct length buffers

2016-03-22 Thread Petros Angelatos
When `btrfs filesystem label /foo bar` command is invoked, it will pass
the buffer allocated in the argv array directly to set_label_mounted()
and then to the BTRFS_IOC_SET_FSLABEL ioctl.

However, the kernel code handling the ioctl will always try to copy
BTRFS_LABEL_SIZE bytes[1] from the userland pointer. Under certain
conditions and when the label is small enough, the command will fail
with:

[root@localhost /]# btrfs filesystem label /mnt f
ERROR: unable to set label Bad address

Fix this by making sure we pass a BTRFS_LABEL_SIZE sized buffer to the
ioctl containing the desired label.

[1] 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/ioctl.c?id=refs/tags/v4.5#n5231

Signed-off-by: Petros Angelatos 
---
 utils.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/utils.c b/utils.c
index c0c564e..ec6c613 100644
--- a/utils.c
+++ b/utils.c
@@ -1755,9 +1755,10 @@ static int set_label_unmounted(const char *dev, const 
char *label)
return 0;
 }
 
-static int set_label_mounted(const char *mount_path, const char *label)
+static int set_label_mounted(const char *mount_path, const char *labelp)
 {
int fd;
+   char label[BTRFS_LABEL_SIZE];
 
fd = open(mount_path, O_RDONLY | O_NOATIME);
if (fd < 0) {
@@ -1765,6 +1766,8 @@ static int set_label_mounted(const char *mount_path, 
const char *label)
return -1;
}
 
+   memset(label, '\0', sizeof(label));
+   strncpy(label, labelp, sizeof(label));
if (ioctl(fd, BTRFS_IOC_SET_FSLABEL, label) < 0) {
fprintf(stderr, "ERROR: unable to set label %s\n",
strerror(errno));
-- 
2.7.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V15 00/15] Btrfs: Subpagesize-blocksize: Allow I/O on blocks whose size is less than page size

2016-03-22 Thread David Sterba
On Thu, Feb 11, 2016 at 11:17:38PM +0530, Chandan Rajendra wrote:
> this patchset temporarily disables the commit
> f82c458a2c3ffb94b431fc6ad791a79df1b3713e.
> 
> The commits for the Btrfs kernel module can be found at
> https://github.com/chandanr/linux/tree/btrfs/subpagesize-blocksize.

The branch does not apply cleanly to at least 4.5, I've tried to rebase
it but there are conflicts that are not simple. Please update it on top
of current master, ie. with the preparatory patchset merged.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] fstests: test fsync on overlayfs merged directory

2016-03-22 Thread Filipe Manana
On Tue, Mar 22, 2016 at 3:00 AM, Eryu Guan  wrote:
> On Mon, Mar 21, 2016 at 05:50:25PM +, fdman...@kernel.org wrote:
>> From: Filipe Manana 
>>
>> Test that calling fsync against a file using the merged directory does
>> not result in a crash nor fails unexpectedly.
>>
>> This is motivated by a change in overlayfs that resulted in a crash
>> (invalid memory access) when the lower or upper directory belonged to
>> a btrfs file system. The overlayfs change came in commit 4bacc9c9234
>> (overlayfs: Make f_path always point to the overlay and f_inode to the
>> underlay). At the moment there are two patches in the linux-fsdevel
>> and linux-btrfs mailing lists to fix this problem:
>>
>>   * vfs: add file_dentry()
>>   * Btrfs: fix crash/invalid memory access on fsync when using overlayfs
>>
>> Signed-off-by: Filipe Manana 
>> ---
>>
>> V2: Removed leftover comment from debugging.
>>
>>  tests/overlay/002 | 74 
>> +++
>>  tests/overlay/002.out |  3 +++
>>  tests/overlay/group   |  1 +
>>  3 files changed, 78 insertions(+)
>>  create mode 100755 tests/overlay/002
>>  create mode 100644 tests/overlay/002.out
>>
>> diff --git a/tests/overlay/002 b/tests/overlay/002
>> new file mode 100755
>> index 000..e5aa610
>> --- /dev/null
>> +++ b/tests/overlay/002
>> @@ -0,0 +1,74 @@
>> +#! /bin/bash
>> +# FS QA Test 002
>> +#
>> +# Test that calling fsync against a file using the merged directory does not
>> +# result in a crash nor fails unexpectedly.
>> +#
>> +# This is motivated by a change in overlayfs that resulted in a crash 
>> (invalid
>> +# memory access) when the lower or upper directory belonged to a btrfs file
>> +# system.
>> +#
>> +#---
>> +#
>> +# Copyright (C) 2016 SUSE Linux Products GmbH. All Rights Reserved.
>> +# Author: Filipe Manana 
>> +#
>> +# This program is free software; you can redistribute it and/or
>> +# modify it under the terms of the GNU General Public License as
>> +# published by the Free Software Foundation.
>> +#
>> +# This program is distributed in the hope that it would be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program; if not, write the Free Software Foundation,
>> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
>> +#---
>> +#
>> +
>> +seq=`basename $0`
>> +seqres=$RESULT_DIR/$seq
>> +echo "QA output created by $seq"
>> +
>> +here=`pwd`
>> +tmp=/tmp/$$
>> +status=1 # failure is the default!
>> +trap "_cleanup; exit \$status" 0 1 2 3 15
>> +
>> +_cleanup()
>> +{
>> + cd /
>> + rm -f $tmp.*
>> +}
>> +
>> +# get standard environment, filters and checks
>> +. ./common/rc
>> +. ./common/filter
>> +
>> +# remove previous $seqres.full before test
>> +rm -f $seqres.full
>> +
>> +# real QA test starts here
>> +_supported_fs generic
>
> The supported fs should be "overlay"? overlay/001 has the same issue
> though.

Yeah, I copied it from 001. And that's a question I made myself but
forgot later to investigate. Since you authored test 001, can you
confirm if it's a mistake or is it really supposed to be 'overlay'?

Thanks

>
> Looks good to me otherwise, test passed with XFS as underlying fs and
> test crashed v4.5 kernel with btrfs as underlying fs, as expected.
>
> Reviewed-by: Eryu Guan 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] fstests: test fsync on overlayfs merged directory

2016-03-22 Thread Eryu Guan
On Tue, Mar 22, 2016 at 11:07:06AM +, Filipe Manana wrote:
> On Tue, Mar 22, 2016 at 3:00 AM, Eryu Guan  wrote:
...
> >> +
> >> +# real QA test starts here
> >> +_supported_fs generic
> >
> > The supported fs should be "overlay"? overlay/001 has the same issue
> > though.
> 
> Yeah, I copied it from 001. And that's a question I made myself but
> forgot later to investigate. Since you authored test 001, can you
> confirm if it's a mistake or is it really supposed to be 'overlay'?

It's my mistake in overlay/001, it should be overlay.

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] fstests: test fsync on overlayfs merged directory

2016-03-22 Thread fdmanana
From: Filipe Manana 

Test that calling fsync against a file using the merged directory does
not result in a crash nor fails unexpectedly.

This is motivated by a change in overlayfs that resulted in a crash
(invalid memory access) when the lower or upper directory belonged to
a btrfs file system. The overlayfs change came in commit 4bacc9c9234
(overlayfs: Make f_path always point to the overlay and f_inode to the
underlay). At the moment there are two patches in the linux-fsdevel
and linux-btrfs mailing lists to fix this problem:

  * vfs: add file_dentry()
  * Btrfs: fix crash/invalid memory access on fsync when using overlayfs

Signed-off-by: Filipe Manana 
Reviewed-by: Eryu Guan 
---

V2: Removed leftover comment from debugging.
V3: Change supperted fs type from generic to overlay.

 tests/overlay/002 | 74 +++
 tests/overlay/002.out |  3 +++
 tests/overlay/group   |  1 +
 3 files changed, 78 insertions(+)
 create mode 100755 tests/overlay/002
 create mode 100644 tests/overlay/002.out

diff --git a/tests/overlay/002 b/tests/overlay/002
new file mode 100755
index 000..ec7874e
--- /dev/null
+++ b/tests/overlay/002
@@ -0,0 +1,74 @@
+#! /bin/bash
+# FS QA Test 002
+#
+# Test that calling fsync against a file using the merged directory does not
+# result in a crash nor fails unexpectedly.
+#
+# This is motivated by a change in overlayfs that resulted in a crash (invalid
+# memory access) when the lower or upper directory belonged to a btrfs file
+# system.
+#
+#---
+#
+# Copyright (C) 2016 SUSE Linux Products GmbH. All Rights Reserved.
+# Author: Filipe Manana 
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+_supported_fs overlay
+_supported_os Linux
+_require_scratch
+
+# Remove all files from previous tests
+_scratch_mkfs
+
+# Create our test file.
+lowerdir=$SCRATCH_DEV/$OVERLAY_LOWER_DIR
+mkdir -p $lowerdir
+touch $lowerdir/foobar
+
+_scratch_mount
+
+# Write some data to our file and fsync it, using the merged directory path.
+# This should work and not result in a crash.
+$XFS_IO_PROG -c "pwrite 0 64k" -c "fsync" $SCRATCH_MNT/foobar | _filter_xfs_io
+
+# success, all done
+status=0
+exit
diff --git a/tests/overlay/002.out b/tests/overlay/002.out
new file mode 100644
index 000..666e61e
--- /dev/null
+++ b/tests/overlay/002.out
@@ -0,0 +1,3 @@
+QA output created by 002
+wrote 65536/65536 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/overlay/group b/tests/overlay/group
index 5056c3b..84d164e 100644
--- a/tests/overlay/group
+++ b/tests/overlay/group
@@ -4,3 +4,4 @@
 # - comment line before each group is "new" description
 #
 001 auto quick copyup
+002 auto quick metadata
-- 
2.7.0.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/13] btrfs: optimize check for stale device

2016-03-22 Thread David Sterba
On Fri, Feb 19, 2016 at 03:10:16PM +0800, Anand Jain wrote:
> > I see crashes with btrfs/011 on a non-debugging config
> >
> > [  641.714363] BUG: unable to handle kernel NULL pointer dereference at 
> > 0068
> > [  641.716057] IP: [] scrub_setup_ctx.isra.19+0x1f6/0x260 
> > [btrfs]
> > [  641.717036] PGD 720c1067 PUD 720c2067 PMD 0
> > [  641.717749] Oops:  [#1] PREEMPT SMP
> ::
> > [  641.723163] CPU: 0 PID: 27766 Comm: btrfs Not tainted 
> > 4.5.0-rc3-next-20160212-1.g38290f0-vanilla #1
> > [  641.724420] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > by qemu-project.org 04/01/2014
> > [  641.725723] task: 8800742481c0 ti: 880071d1 task.ti: 
> > 880071d1
> > [  641.726954] RIP: 0010:[]  [] 
> > scrub_setup_ctx.isra.19+0x1f6/0x260 [btrfs]
> > [  641.728404] RSP: 0018:880071d13ce8  EFLAGS: 00010202
> > [  641.729413] RAX: 88007231e800 RBX: 88007231e800 RCX: 
> > 
> > [  641.730610] RDX: a0195638 RSI: a017c5a8 RDI: 
> > 88007231ea80
> > [  641.731832] RBP: 880071d13d18 R08:  R09: 
> > 88007204ea00
> > [  641.733085] R10: 0008 R11:  R12: 
> > 
> > [  641.734307] R13: 0001 R14: 88007231e9f8 R15: 
> > 003f
> > [  641.735544] FS:  7f03ed36d8c0() GS:88007fc0() 
> > knlGS:
> > [  641.736883] CS:  0010 DS:  ES:  CR0: 80050033
> > [  641.738022] CR2: 0068 CR3: 720c CR4: 
> > 06f0
> > [  641.739325] Stack:
> > [  641.740156]  8800724d4000 8800724d4000  
> > 8800722ef000
> > [  641.741735]   8800724d4fc8 880071d13d98 
> > a01566fd
> > [  641.743163]  88007b127000 0019 8800724d4ce8 
> > 
> > [  641.744599] Call Trace:
> > [  641.745553]  [] btrfs_scrub_dev+0x13d/0x510 [btrfs]
> > [  641.746894]  [] btrfs_dev_replace_start+0x279/0x3f0 
> > [btrfs]
> > [  641.748282]  [] btrfs_ioctl+0x1869/0x2070 [btrfs]
> > [  641.749587]  [] ? pte_alloc_one+0x33/0x40
> > [  641.750850]  [] do_vfs_ioctl+0x96/0x590
> > [  641.752128]  [] ? __do_page_fault+0x181/0x450
> > [  641.753432]  [] SyS_ioctl+0x79/0x90
> > [  641.754663]  [] entry_SYSCALL_64_fastpath+0x1e/0xa8
> > [  641.756037] Code: 00 48 c7 c2 38 56 19 a0 48 c7 c6 a8 c5 17 a0 e8 21 39 
> > f7 e0 45 85 ed 48 c7 83 68 02 00 00 00 00 00 00 48 89 d8 0f 84 03 ff ff ff 
> > <49> 83 7c 24 68 00 74 40 c7 83 78 02 00 00 20 00 00 00 4c 89 a3
> > [  641.760392] RIP  [] 
> > scrub_setup_ctx.isra.19+0x1f6/0x260 [btrfs]
> > [  641.761970]  RSP 
> > [  641.763190] CR2: 0068
> > [  641.767218] ---[ end trace f46d4e6a90bda310 ]---
> >
> > the dereference happens at offset 0x68 which matches bdev in
> > btrfs_device, so this patch is my best guess at the moment. I'm not able
> > to reproduce it directly so I need to wait for a rebuild and repeat.
> 
> 
>Looks like dev was fine when find_device was called, but
>later it was null when ->bdev was accessed.
> 
>I couldn't reproduce here. There are 10 workouts within btrfs/011
>any idea workout caused this? As of now I am guessing..
> 
>workout "-m dup -d single" 1 cancel quick
> 
>digging more.

I was not able reproduce the crash since. All ok on a physical machine,
in a virtual machine in kvm the test runs for a long time and then
freezes (serial console, ssh). The kvm process eats 100% cpu, not
possible to debug it directly. The branch stays in my for-next and is
on the way to 4.7, we'll see if we can reproduce it.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V15 00/15] Btrfs: Subpagesize-blocksize: Allow I/O on blocks whose size is less than page size

2016-03-22 Thread Chandan Rajendra
On Tuesday 22 Mar 2016 12:04:23 David Sterba wrote:
> On Thu, Feb 11, 2016 at 11:17:38PM +0530, Chandan Rajendra wrote:
> > this patchset temporarily disables the commit
> > f82c458a2c3ffb94b431fc6ad791a79df1b3713e.
> > 
> > The commits for the Btrfs kernel module can be found at
> > https://github.com/chandanr/linux/tree/btrfs/subpagesize-blocksize.
> 
> The branch does not apply cleanly to at least 4.5, I've tried to rebase
> it but there are conflicts that are not simple. Please update it on top
> of current master, ie. with the preparatory patchset merged.

Hi David,

I will rebase the branch and post the patchset soon.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Unable to rescue super-recover and/or restore

2016-03-22 Thread Thomas Kuther
Hi everybody,

I could need some advice on recovering a sort of broken filesystem
(power cycle death). It seems the first superblock is damaged, but I'm
unable to recover from a good one.

Running on a livecd with latest kernel and btrfs-tools I could get my
hands on, this is all the info I can give:


# uname -a
Linux archiso 4.4.1-2-ARCH #1 SMP PREEMPT Wed Feb 3 13:12:33 UTC 2016
x86_64 GNU/Linux

# btrfs version
btrfs-progs v4.5

# fdisk -l /dev/sda
Disk /dev/sda: 119.2 GiB, 128035676160 bytes, 250069680 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 8A4EA6CE-3E19-48DC-B6D3-A01F22F6EEC3

Device Start   End   Sectors   Size Type
/dev/sda1   2048  6143  4096 2M BIOS boot
/dev/sda2   6144 250069646 250063503 119.2G Linux filesystem

# btrfs-show-super -a /dev/sda2
superblock: bytenr=65536, device=/dev/sda2
-
ERROR: bad magic on superblock on /dev/sda2 at 65536

superblock: bytenr=67108864, device=/dev/sda2
-
csum 0xaf86e706 [match]
bytenr 67108864
flags 0x1
( WRITTEN )
magic _BHRfS_M [match]
fsid 4fe9d531-bef6-431d-b6be-ed68df8023bc
label
generation 1176204
root 42795008
sys_array_size 226
chunk_root_generation 158914
root_level 1
chunk_root 20971520
chunk_root_level 0
log_root 0
log_root_transid 0
log_root_level 0
total_bytes 128032509952
bytes_used 79141912576
sectorsize 4096
nodesize 16384
leafsize 16384
stripesize 4096
root_dir 6
num_devices 1
compat_flags 0x0
compat_ro_flags 0x0
incompat_flags 0x69
( MIXED_BACKREF |
  COMPRESS_LZO |
  BIG_METADATA |
  EXTENDED_IREF )
csum_type 0
csum_size 4
cache_generation 1176204
uuid_tree_generation 1176204
dev_item.uuid e0474a00-70fe-4aee-9dc4-467966f4998c
dev_item.fsid 4fe9d531-bef6-431d-b6be-ed68df8023bc [match]
dev_item.type 0
dev_item.total_bytes 128032509952
dev_item.bytes_used 128032505856
dev_item.io_align 4096
dev_item.io_width 4096
dev_item.sector_size 4096
dev_item.devid 1
dev_item.dev_group 0
dev_item.seek_speed 0
dev_item.bandwidth 0
dev_item.generation 0

To my understanding, that backup superblock looks healthy (I might be
wrong though, just looking at the csum MATCH). So trying to recover
from it:

# btrfs check -s1 /dev/sda2
using SB copy 1, bytenr 67108864
checksum verify failed on 42795008 found 8FF848E2 wanted 
checksum verify failed on 42795008 found 8FF848E2 wanted 
checksum verify failed on 42795008 found 8FF848E2 wanted 
checksum verify failed on 42795008 found 8FF848E2 wanted 
bytenr mismatch, want=42795008, have=18446744073709551615
Couldn't read tree root
Couldn't open file system

# btrfs rescue super-recover -v /dev/sda2
All Devices:
Device: id = 1, name = /dev/sda2

ctree.h:2068: btrfs_super_csum_size: Assertion `t >=
ARRAY_SIZE(btrfs_csum_sizes)` failed.
btrfs[0x436f6a]
btrfs(btrfs_recover_superblocks+0x19c)[0x43712c]
btrfs(main+0x82)[0x40adf2]
/usr/lib/libc.so.6(__libc_start_main+0xf0)[0x7fd0ce8fa710]
btrfs(_start+0x29)[0x40aef9]


# btrfs-select-super -s1 /dev/sda2
checksum verify failed on 42795008 found 8FF848E2 wanted 
checksum verify failed on 42795008 found 8FF848E2 wanted 
checksum verify failed on 42795008 found 8FF848E2 wanted 
checksum verify failed on 42795008 found 8FF848E2 wanted 
bytenr mismatch, want=42795008, have=18446744073709551615
Couldn't read tree root
Open ctree failed


Trying to get the data off using restore, no surprise:

# btrfs restore -u1 -D /dev/sda2 /mnt
checksum verify failed on 42795008 found 8FF848E2 wanted 
checksum verify failed on 42795008 found 8FF848E2 wanted 
checksum verify failed on 42795008 found 8FF848E2 wanted 
checksum verify failed on 42795008 found 8FF848E2 wanted 
bytenr mismatch, want=42795008, have=18446744073709551615
Couldn't read tree root
Could not open root, trying backup super
Superblock bytenr is larger than device size
Could not open root, trying backup super


Do you have any advice on how to proceed here?

Best regards,
Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 00/27][For 4.7] Btrfs: Add inband (write time) de-duplication framework

2016-03-22 Thread David Sterba
On Tue, Mar 22, 2016 at 09:35:25AM +0800, Qu Wenruo wrote:
> This updated version of inband de-duplication has the following features:
> 1) ONE unified dedup framework.
> 2) TWO different back-end with different trade-off

The on-disk format is defined in code, would be good to give some
overview here.

> 3) Support compression with dedupe
> 4) Ioctl interface with persist dedup status

I'd like to see the ioctl specified in more detail. So far there's
enable, disable and status. I'd expect some way to control the in-memory
limits, let it "forget" current hash cache, specify the dedupe chunk
size, maybe sync of the in-memory hash cache to disk.

> 5) Ability to disable dedup for given dirs/files

This would be good to extend to subvolumes.

> TODO:
> 1) Add extent-by-extent comparison for faster but more conflicting algorithm
>Current SHA256 hash is quite slow, and for some old(5 years ago) CPU,
>CPU may even be a bottleneck other than IO.
>But for faster hash, it will definitely cause conflicts, so we need
>extent comparison before we introduce new dedup algorithm.

If sha256 is slow, we can use a less secure hash that's faster but will
do a full byte-to-byte comparison in case of hash collision, and
recompute sha256 when the blocks are going to disk. I haven't thought
this through, so there are possibly details that could make unfeasible.

The idea is to move expensive hashing to the slow IO operations and do
fast but not 100% safe hashing on the read/write side where performance
matters.

> 2) Misc end-user related helpers
>Like handy and easy to implement dedup rate report.
>And method to query in-memory hash size for those "non-exist" users who
>want to use 'dedup enable -l' option but didn't ever know how much
>RAM they have.

That's what we should try know and define in advance, that's part of the
ioctl interface.

I went through the patches, there are a lot of small things to fix, but
first I want to be sure about the interfaces, ie. on-disk and ioctl.

Then we can start to merge the patchset in smaller batches, the
in-memory deduplication does not have implications on the on-disk
format, so it's "just" the ioctl part.

The patches at the end of the series fix bugs introduced within the same
series, these should be folded to the patches that are buggy.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: overlay file to test btrfs repairs

2016-03-22 Thread Austin S. Hemmelgarn

On 2016-03-21 13:13, Chris Murphy wrote:

On Mon, Mar 21, 2016 at 5:22 AM, Austin S. Hemmelgarn
 wrote:

On 2016-03-21 05:55, Duncan wrote:


Chris Murphy posted on Sun, 20 Mar 2016 21:43:52 -0600 as excerpted:


Hi folks,

So I just ran into this:
https://raid.wiki.kernel.org/index.php/


Recovering_a_failed_software_RAID#Making_the_harddisks_read-
only_using_an_overlay_file

[That's a single link, wrapped by my client.]


This is a device mapper overlay file - not overlayfs.

For the repairs that are sometimes uncertain what's next, maybe this is
a viable option to avoid changing the file system? I'm thinking
chunk-recover might take up too much space, I'm not sure how that one
works, if chunks are just being read or if they have to be rewritten or
if it's just the chunk tree? But for 'btrfs check' and 'btrfs rescue
super-recover/zero-log' there should be very little being written so the
overlay idea might be a good step?

Opinions?



That's a creative and potentially quite useful possible solution to an
often hairy problem.  Thanks for bringing it up. =:^)

Provided Hugo and the devs don't find major fault with the idea, linking
that from appropriate locations (as a possible solution in the Problem
FAQ is the first one that occurs to me) in the btrfs wiki could be quite
useful, to many.


If we could find some way to have the programs themselves do this if the
system supports it (and the user opts in of course), it would be really
helpful.  That said, I can see this possibly causing issues due to duplicate
device UUID's.


I thought of this. Btrfs seed device. The problem is it has some
minimal requirements (that I don't understand) for file system
integrity, probably starting out with the superblocks all being in a
good state. So literal leveraging of seed device is not possible, and
it's also non-obvious. Any repairs should be fail safe or they're
arguably broken. But if there were a way to effectively setup a seed +
ram or file based device behind the scene so that repairs can be
tested, that might be useful. And it would be mountable, even rw, and
that too would be reversible.

OTOH, if we could add some way to tell the code (both userspace and 
in-kernel) to explicitly ignore specific devices when trying to assemble 
filesystems, that would allow us to use DM snapshots (or something 
similar) to do this, and would also allow people to work around the UUID 
issues when dealing with LVM snapshots (or similar situations).


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Experimental btrfs encryption

2016-03-22 Thread David Sterba
On Thu, Mar 03, 2016 at 09:58:53AM +0800, Anand Jain wrote:
> . (I received couple of private emails on this, so looks like
>   I confused you and I'm writing again to clear the air on this).
> 
> > - Uses btrfs compression framework, so compression and then
> >encryption is not possible. However yet evaluate if there
> >are encryption algorithm which can compress as well.
> 
>   It should be compression and then encryption. I didn't mean to say
>   the other way around. However the btrfs encoding framework is
>   designed to handle any one of it in an elegant manner. So as of
>   user can configure either encryption OR compression by design.
> 
>   Further for users who are looking for both compression and encryption.
>   There are two ways that we could implement in future in the btrfs.
>   One enhance the btrfs encoding framework so that it can accommodate
>   two cascaded engines like compression and followed by encryption.
>   Or (mostly) a better approach would be to evaluate a single encoding
>   engine (algorithm) which can do both (compression and then encryption).
>   Which I think will be less invasive within btrfs, and probably be more
>   efficient.
> 
>   Hope I sound clearer now. Sorry if I wasn't before.
> 
> . I have put up a doc here:
> 
> https://docs.google.com/document/d/1fq9snDM_4ikn44UDNErjHqKXgZHukiJWS4Il3qVhm3M/edit?usp=sharing

It's just text, you could also send it as part of the patchset.

I'd very much like to see the crypto part covered in more detail, the
threat model, what's the right cipher to use and why. You've chosen
something that would demonstrate how it works, but eg. implementing the
AEAD mode would not be straightforward to add, while it would be a
significant difference against the current per-file encryption (we can
store per-block associated data cheaply). And there are certainly other
questions that I've missed.

The per-file encryption is on the way to the VFS layer, so we can
implement it in btrfs afterwards. Then the benefit of the proposed
patchset would be encrypted data on subvolume with snapshotting enabled.
That's probably something that people want and convers common usecases.
But we can go further, like full subvolume metadata encryption.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfsck memory usage reduce idea

2016-03-22 Thread David Sterba
On Mon, Mar 21, 2016 at 10:15:55AM +0800, Qu Wenruo wrote:
> > IOW, there will be two options for the use to choose from, right? That's
> > what I'd expect. Be able to check the filesystem on a machine with less
> > memory at the cost of IO, but also do the faster check on a different
> > machine.
> 
> I was planning to use the new extent tree check to replace current one, 
> as a rework.
> Am I always reworking things? :)

The problem with big reworks is that there are few people willing to
review them. So I'm not against doing such changes, especially in this
case it would be welcome, but I'm afraid that it could end up stalled
similar to the convert rewrite.

> The point that I didn't want to keep the current behavior is, the old 
> one is just OK or OOM, no one would know if it will OOM until it happens.
> 
> But the new one would be much flex than current behavior.
> As it fully uses the IO cache provided by kernel.

That's a good point, for the single implementation.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: build: fix static standalone utilities

2016-03-22 Thread David Sterba
On Mon, Mar 21, 2016 at 10:16:07AM -0400, Noah Massey wrote:
> commit b5e7979 "btrfs-progs: build: extend per-binary objects" allows
> the standalone utilities to link against object files shared with the
> main binary. However, the btrfs-*.static targets need to be adjusted
> to build against the static versions of the common files.
> 
> Signed-off-by: Noah Massey 

Applied, thanks. I'll release a .1 version probably next week as there
are more build fixes accumulated.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: utils: make sure set_label_mounted uses correct length buffers

2016-03-22 Thread David Sterba
On Tue, Mar 22, 2016 at 03:40:29AM -0700, Petros Angelatos wrote:
> When `btrfs filesystem label /foo bar` command is invoked, it will pass
> the buffer allocated in the argv array directly to set_label_mounted()
> and then to the BTRFS_IOC_SET_FSLABEL ioctl.
> 
> However, the kernel code handling the ioctl will always try to copy
> BTRFS_LABEL_SIZE bytes[1] from the userland pointer. Under certain
> conditions and when the label is small enough, the command will fail
> with:
> 
> [root@localhost /]# btrfs filesystem label /mnt f
> ERROR: unable to set label Bad address

Good catch, I was not aware of the hidden requirement.

> Fix this by making sure we pass a BTRFS_LABEL_SIZE sized buffer to the
> ioctl containing the desired label.
> 
> [1] 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/ioctl.c?id=refs/tags/v4.5#n5231
> 
> Signed-off-by: Petros Angelatos 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6 v3] btrfs-progs: subvolume functions reorg

2016-03-22 Thread David Sterba
On Mon, Mar 21, 2016 at 03:20:59PM +0800, Anand Jain wrote:
> We need subvolume helper functions easily accessible for features
> around subvolume. This patch set is just a cleanup of subvolume
> functions.
> 
> This is tested fine with fstests group subvol and subvol hand tests.
> 
> v3: Separate changes into 6 commits,
> Drops the idea of creating subvolume.c, instead update utils.c.
> Adds 1/5 a minor func mv and 3/5 remove of a duplicate func and
> 6/6 rename of get_subvol_name as per review comment.
> 
> v2: Update commit log. Separate this patch from the encryption
> patch set.
> 
> 
> Anand Jain (6):
>   btrfs-progs: spatial rearrange subvolume functions together
>   btrfs-progs: move test_issubvolume() to utils.c
>   btrfs-progs: remove duplicate function __is_subvol()
>   btrfs-progs: move get_subvol_name() to utils.c
>   btrfs-progs: create get_subvol_info()
>   btrfs-progs: rename get_subvol_name() to subvol_minus_mnt()

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/13] btrfs: optimize check for stale device

2016-03-22 Thread Anand Jain



On 03/22/2016 08:21 PM, David Sterba wrote:

On Fri, Feb 19, 2016 at 03:10:16PM +0800, Anand Jain wrote:

I see crashes with btrfs/011 on a non-debugging config

[  641.714363] BUG: unable to handle kernel NULL pointer dereference at 
0068
[  641.716057] IP: [] scrub_setup_ctx.isra.19+0x1f6/0x260 
[btrfs]
[  641.717036] PGD 720c1067 PUD 720c2067 PMD 0
[  641.717749] Oops:  [#1] PREEMPT SMP

::

[  641.723163] CPU: 0 PID: 27766 Comm: btrfs Not tainted 
4.5.0-rc3-next-20160212-1.g38290f0-vanilla #1
[  641.724420] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by 
qemu-project.org 04/01/2014
[  641.725723] task: 8800742481c0 ti: 880071d1 task.ti: 
880071d1
[  641.726954] RIP: 0010:[]  [] 
scrub_setup_ctx.isra.19+0x1f6/0x260 [btrfs]
[  641.728404] RSP: 0018:880071d13ce8  EFLAGS: 00010202
[  641.729413] RAX: 88007231e800 RBX: 88007231e800 RCX: 
[  641.730610] RDX: a0195638 RSI: a017c5a8 RDI: 88007231ea80
[  641.731832] RBP: 880071d13d18 R08:  R09: 88007204ea00
[  641.733085] R10: 0008 R11:  R12: 
[  641.734307] R13: 0001 R14: 88007231e9f8 R15: 003f
[  641.735544] FS:  7f03ed36d8c0() GS:88007fc0() 
knlGS:
[  641.736883] CS:  0010 DS:  ES:  CR0: 80050033
[  641.738022] CR2: 0068 CR3: 720c CR4: 06f0
[  641.739325] Stack:
[  641.740156]  8800724d4000 8800724d4000  
8800722ef000
[  641.741735]   8800724d4fc8 880071d13d98 
a01566fd
[  641.743163]  88007b127000 0019 8800724d4ce8 

[  641.744599] Call Trace:
[  641.745553]  [] btrfs_scrub_dev+0x13d/0x510 [btrfs]
[  641.746894]  [] btrfs_dev_replace_start+0x279/0x3f0 [btrfs]
[  641.748282]  [] btrfs_ioctl+0x1869/0x2070 [btrfs]
[  641.749587]  [] ? pte_alloc_one+0x33/0x40
[  641.750850]  [] do_vfs_ioctl+0x96/0x590
[  641.752128]  [] ? __do_page_fault+0x181/0x450
[  641.753432]  [] SyS_ioctl+0x79/0x90
[  641.754663]  [] entry_SYSCALL_64_fastpath+0x1e/0xa8
[  641.756037] Code: 00 48 c7 c2 38 56 19 a0 48 c7 c6 a8 c5 17 a0 e8 21 39 f7 e0 45 
85 ed 48 c7 83 68 02 00 00 00 00 00 00 48 89 d8 0f 84 03 ff ff ff <49> 83 7c 24 
68 00 74 40 c7 83 78 02 00 00 20 00 00 00 4c 89 a3
[  641.760392] RIP  [] scrub_setup_ctx.isra.19+0x1f6/0x260 
[btrfs]
[  641.761970]  RSP 
[  641.763190] CR2: 0068
[  641.767218] ---[ end trace f46d4e6a90bda310 ]---

the dereference happens at offset 0x68 which matches bdev in
btrfs_device, so this patch is my best guess at the moment. I'm not able
to reproduce it directly so I need to wait for a rebuild and repeat.



Looks like dev was fine when find_device was called, but
later it was null when ->bdev was accessed.

I couldn't reproduce here. There are 10 workouts within btrfs/011
any idea workout caused this? As of now I am guessing..

workout "-m dup -d single" 1 cancel quick

digging more.


I was not able reproduce the crash since. All ok on a physical machine,
in a virtual machine in kvm the test runs for a long time and then
freezes (serial console, ssh). The kvm process eats 100% cpu, not
possible to debug it directly. The branch stays in my for-next and is
on the way to 4.7, we'll see if we can reproduce it.


Agreed. Thanks Dave.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: fix build of standalone utilities after clean

2016-03-22 Thread David Sterba
$ make clean
$ make btrfs-debug-tree

will fail because the dependency from $(btrfs_debug_tree_objects) is
missing. The variable standalone_deps magically collects all the deps
and will build them in advance. The simple fix to use the existing
substitution based on $@ does not work for pattern rules, as Noah found
out.

Reported-by: Noah Massey 
Signed-off-by: David Sterba 
---
 Makefile.in | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/Makefile.in b/Makefile.in
index 71ef76d4fd4e..0a1aece70dab 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -130,6 +130,9 @@ btrfs_debug_tree_objects = cmds-inspect-dump-tree.o
 btrfs_show_super_objects = cmds-inspect-dump-super.o
 btrfs_calc_size_objects = cmds-inspect-tree-stats.o
 
+# collect values of the variables above
+standalone_deps = $(foreach dep,$(patsubst %,%_objects,$(subst -,_,$(filter 
btrfs-%, $(progs,$($(dep)))
+
 SUBDIRS =
 BUILDDIRS = $(patsubst %,build-%,$(SUBDIRS))
 INSTALLDIRS = $(patsubst %,install-%,$(SUBDIRS))
@@ -256,14 +259,14 @@ $(libs_static): $(libbtrfs_objects)
 # For static variants, use an extra $(subst) to get rid of the ".static"
 # from the target name before translating to list of libs
 
-btrfs-%.static: $(static_objects) btrfs-%.static.o $(static_libbtrfs_objects)
+btrfs-%.static: $(static_objects) btrfs-%.static.o $(static_libbtrfs_objects) 
$(patsubst %.o,%.static.o,$(standalone_deps))
@echo "[LD] $@"
$(Q)$(CC) $(STATIC_CFLAGS) -o $@ $@.o $(static_objects) \
$(patsubst %.o, %.static.o, $($(subst -,_,$(subst 
.static,,$@)-objects))) \
$(static_libbtrfs_objects) $(STATIC_LDFLAGS) \
$($(subst -,_,$(subst .static,,$@)-libs)) $(STATIC_LIBS)
 
-btrfs-%: $(objects) $(libs_static) btrfs-%.o
+btrfs-%: $(objects) $(libs_static) btrfs-%.o $(standalone_deps)
@echo "[LD] $@"
$(Q)$(CC) $(CFLAGS) -o $@ $(objects) $@.o \
$($(subst -,_,$@-objects)) \
-- 
2.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: overlay file to test btrfs repairs

2016-03-22 Thread Duncan
Austin S. Hemmelgarn posted on Tue, 22 Mar 2016 10:21:57 -0400 as
excerpted:

> OTOH, if we could add some way to tell the code (both userspace and
> in-kernel) to explicitly ignore specific devices when trying to assemble
> filesystems, that would allow us to use DM snapshots (or something
> similar) to do this, and would also allow people to work around the UUID
> issues when dealing with LVM snapshots (or similar situations).

That's a good idea, but minor detail, it'd need to resolve to specific 
block-device major:minor comparison; it couldn't be a simple device-path 
blacklist, because device paths are routinely symlinked.

I guess that's obvious from a kernel dev perspective, but perhaps not so 
much from an admin-user perspective, where the device-path /is/ often 
considered the device.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: csum errors in VirtualBox VDI files

2016-03-22 Thread Kai Krakow
Am Tue, 22 Mar 2016 16:47:10 +0800
schrieb Qu Wenruo :

> Hi,
> 
> Kai Krakow wrote on 2016/03/22 09:03 +0100:
> > Hello!
> >
> > Since one of the last kernel updates (I don't know which exactly),
> > I'm experiencing csum errors within VDI files when running
> > VirtualBox. A side effect of this is, as soon as dmesg shows these
> > errors, commands like "du" and "df" hang until reboot.
> >
> > I've now restored the file from backup but it happens over and over
> > again.
> >
> > On another machine I'm also seeing errors with big files in the
> > following scenario (apparently an older kernel, 4.1.x I afair):
> >
> > # ntfsclone --save /dev/md126p2 -o rescue.ntfs.img
> > ^ big NTFS partition   ^ file on btrfs
> >
> > results in a write error and the file system goes read-only.  
> 
> When it goes RO, it must have some warning in kernel log.
> Would you please paste the kernel log?

Apparently, that system does not boot now due to errors in bcache
b-tree. That being that, it may well be some bcache error and not
btrfs' fault. Apparently I couldn't catch the output, I've been in a
hurry. It said "write error" and had some backtrace. I will come to
this back later.

Let's go to the system I currently care about (that one with the
always breaking VDI file):

> > Both systems have in common they are using btrfs on bcache with
> > compress=lzo,autodefrag,nossd,discard (mraid=1,draid=0 and
> > mraid=1,draid=single).
> >
> > The system mentioned first is running Kernel 4.5.0 with Gentoo
> > patch-set. I upgraded from the last 4.4.x kernel when I first
> > experienced this problem. The first time the problem resulted in a
> > duplicate extent which btrfsck wasn't able to fix, that's when I
> > first restored from backup. But now I'm getting csum errors in this
> > file over a over again, plus when rsync has run for backup, the
> > system no longer responds to "du" and "df" commands - it just hangs.
> >
> > Known problem? Does it help if I send debug info? If so, please
> > instruct.
> >  
> Does btrfs check report anything wrong?

After the error occured?

Yes, some text about the extent being compressed and btrfs repair
doesn't currently handle that case (I tried --repair as I'm having a
backup). I simply decided not to investigate that further at that point
but delete and restore the affected file from backup. However, this is
the message from dmesg (tho, I didn't catch the backtrace):

btrfs_run_delayed_refs:2927: errno=-17 Object already exists

After this, the system went RO and I had to reboot. I ran btrfs check
and it told about a duplicate extent. I identified the file (using
btrfs inspect and the inode number) being the VDI file, and restored it.
Afterwards, I upgraded from latest 4.4 to 4.5. Currently, I'm now
watching closer since this incident, and the file becomes damaged
without any message in the kernel log when doing some more than usual
IO in VirtualBox. When my backup script then runs over the file, I get
errors about missing csums - the block is not readable. I now ran
ddrescue, and replaced the file to get a current and slightly damaged
VDI image back (my backup uses time rotation, so no problem). But
running chkdsk in VirtualBox damages the VDI again.

Regarding the other error on the other machine, I'm not completely
convinced bcache ain't involved in this problem.

As soon as I "produced" csum errors again, I'll run btrfs check. Or
should I do it now without forcing the csum error to occur?


-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: csum errors in VirtualBox VDI files

2016-03-22 Thread Chris Murphy
This is kinda confusing.

So the gist is that the guest OS is Windows, so the VDI contains an
NTFS file system. Correct? And that VDI is on a Btrfs formatted bcache
device. Correct? Does the VDI have +C xattr set?


Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: csum errors in VirtualBox VDI files

2016-03-22 Thread Henk Slager
On Tue, Mar 22, 2016 at 9:03 AM, Kai Krakow  wrote:
> Hello!
>
> Since one of the last kernel updates (I don't know which exactly), I'm
> experiencing csum errors within VDI files when running VirtualBox. A
> side effect of this is, as soon as dmesg shows these errors, commands
> like "du" and "df" hang until reboot.
>
> I've now restored the file from backup but it happens over and over
> again.
>
> On another machine I'm also seeing errors with big files in the
> following scenario (apparently an older kernel, 4.1.x I afair):
>
> # ntfsclone --save /dev/md126p2 -o rescue.ntfs.img
>^ big NTFS partition   ^ file on btrfs
>
> results in a write error and the file system goes read-only.
>
> Both systems have in common they are using btrfs on bcache with
> compress=lzo,autodefrag,nossd,discard (mraid=1,draid=0 and
> mraid=1,draid=single).

autodefrag,nossd:  I am using these too on bcached btrfs, so far no
issues (including kernel 4.5.0). I have been writing bigfiles (> 50G)
without problems.
compress=lzo: I have used it on bcached btrfs in the past, worked fine
discard: I am not sure what this should do on top of /dev/bcacheX ; I
don't know how this relates to the bcache discard; I currently have
bcache setting:  Discard?  False

I am not saying that the last mount option is the direct cause of the
problems you experience, it is just that I don't know its impact
currently.

> The system mentioned first is running Kernel 4.5.0 with Gentoo
> patch-set. I upgraded from the last 4.4.x kernel when I first
> experienced this problem. The first time the problem resulted in a
> duplicate extent which btrfsck wasn't able to fix, that's when I first
> restored from backup. But now I'm getting csum errors in this file over
> a over again, plus when rsync has run for backup, the system no longer
> responds to "du" and "df" commands - it just hangs.
>
> Known problem? Does it help if I send debug info? If so, please
> instruct.
>
> --
> Regards,
> Kai
>
> Replies to list-only preferred.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RAID Assembly with Missing Empty Drive

2016-03-22 Thread John Marrett
I recently had a drive failure in a file server running btrfs. The
failed drive was completely non-functional. I added a new drive to the
filesystem successfully, when I attempted to remove the failed drive I
encountered an error. I discovered that I actually experienced a dual
drive failure, the second drive only exhibited as failed when btrfs
tried to write to the drives in the filesystem when I removed the
disk.

I shut down the array and imaged the failed drive using GNU ddrescue,
I was able to recover all but a few kb from the drive. Unfortunately,
when I imaged the drive I overwrote the drive that I had successfully
added to the filesystem.

This brings me to my current state, I now have two devices missing:

 - the completely failed drive
 - the empty drive that I overwrote with the second failed disks image

Consequently I can't start the filesystem. I've discussed the issue in
the past with Ke and other people on the #btrfs channel, the
concensus; as I understood it, is that with the right patch it should
be possible to mount either the array with the empty drive absent or
to create a new btrfs fileystem on an empty drive and then manipulate
its UUIDs so that it believes it's the missing UUID from the existing
btrfs filesystem.

Here's the info showing the current state of the filesystem:

ubuntu@ubuntu:~$ sudo btrfs filesystem show
warning, device 6 is missing
warning devid 6 not found already
warning devid 7 not found already
Label: none  uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
Total devices 7 FS bytes used 5.47TiB
devid1 size 1.81TiB used 1.71TiB path /dev/sda3
devid2 size 1.81TiB used 1.71TiB path /dev/sdb3
devid3 size 1.82TiB used 1.72TiB path /dev/sdc1
devid4 size 1.82TiB used 1.72TiB path /dev/sdd1
devid5 size 2.73TiB used 2.62TiB path /dev/sde1
*** Some devices missing
btrfs-progs v4.0
ubuntu@ubuntu:~$ sudo mount -o degraded /dev/sda3 /mnt
mount: wrong fs type, bad option, bad superblock on /dev/sda3,
   missing codepage or helper program, or other error

   In some cases useful info is found in syslog - try
   dmesg | tail or so.
ubuntu@ubuntu:~$ dmesg
[...]
[  749.322385] BTRFS info (device sde1): allowing degraded mounts
[  749.322404] BTRFS info (device sde1): disk space caching is enabled
[  749.323571] BTRFS warning (device sde1): devid 6 uuid
f41bcb72-e88a-432f-9961-01307ec291a9 is missing
[  749.335543] BTRFS warning (device sde1): devid 7 uuid
17f8e02a-923e-4ac3-9db2-eb1b47c1a8db missing
[  749.407802] BTRFS: bdev (null) errs: wr 81791613, rd 57814378,
flush 0, corrupt 0, gen 0
[  749.407808] BTRFS: bdev /dev/sde1 errs: wr 0, rd 5002, flush 0,
corrupt 0, gen 0
[  774.759717] BTRFS: too many missing devices, writeable mount is not allowed
[  774.804053] BTRFS: open_ctree failed

Thank you in advance for your help,

-JohnF
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: csum errors in VirtualBox VDI files

2016-03-22 Thread Kai Krakow
Am Tue, 22 Mar 2016 13:42:00 -0600
schrieb Chris Murphy :

> This is kinda confusing.

Err yes maybe... ;-D

> So the gist is that the guest OS is Windows, so the VDI contains an
> NTFS file system. Correct? And that VDI is on a Btrfs formatted bcache
> device. Correct? Does the VDI have +C xattr set?

Yes:

VirtualBox runs guest OS "Windows 7 32-bit"
VDI file is stored on btrfs running on bcache device
chattr +C is not set

I used chattr +C until a few weeks ago when I decided to flip
autodefrag back on and realised that +C won't allow compressing file
contents. The VM actually ran faster afterwards. And it ran fine until
lately when those errors occurred.

But I have to admit I didn't use the VM a lot in a while. So the first
time the error occurred may be a few more days back. According to my
snapshot backlog on the backup partition, the last successful backup of
this VDI file was on 2016-03-13 when I was running kernel 4.4.4
according to my emerge log.

-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: overlay file to test btrfs repairs

2016-03-22 Thread Henk Slager
On Mon, Mar 21, 2016 at 4:43 AM, Chris Murphy  wrote:
> Hi folks,
>
> So I just ran into this:
> https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file
>
> This is a device mapper overlay file - not overlayfs.
>
> For the repairs that are sometimes uncertain what's next, maybe this
> is a viable option to avoid changing the file system? I'm thinking
> chunk-recover might take up too much space, I'm not sure how that one
> works, if chunks are just being read or if they have to be rewritten
> or if it's just the chunk tree? But for 'btrfs check' and 'btrfs
> rescue super-recover/zero-log' there should be very little being
> written so the overlay idea might be a good step?

I used the info via this message:
http://permalink.gmane.org/gmane.comp.file-systems.btrfs/54178

to try to fix a 4x4TB disks RAID10 (some bad metadata, some nbytes 400 errors).
I used AoE (instead of NBD) to avoid that btrfs+kernel might get
confused by double UUID's.

I created 4x 10G sparse files for each bcached HDD. After the --repair
action had ended (apparently successful), du reported only 50M size on
disk for each of the sparse files. The fix operation lasted about 1.5
hours. After a mount and umount again of the 'just repaired fs', a
subsequent btrfs check still reported the same errors, although
reported in another sequence.
So the nbytes 400 errors actually did not get fixed ( while there were
also other errors; This in accordance to what Qu once noted, but at
that time older tools/kernel).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID Assembly with Missing Empty Drive

2016-03-22 Thread Henk Slager
On Tue, Mar 22, 2016 at 9:19 PM, John Marrett  wrote:
> I recently had a drive failure in a file server running btrfs. The
> failed drive was completely non-functional. I added a new drive to the

I asume you did btrfs device add  ?
Or did you do this withbtrfs replace  ?

> filesystem successfully, when I attempted to remove the failed drive I
> encountered an error. I discovered that I actually experienced a dual
> drive failure, the second drive only exhibited as failed when btrfs
> tried to write to the drives in the filesystem when I removed the
> disk.
>
> I shut down the array and imaged the failed drive using GNU ddrescue,
> I was able to recover all but a few kb from the drive. Unfortunately,
> when I imaged the drive I overwrote the drive that I had successfully
> added to the filesystem.
>
> This brings me to my current state, I now have two devices missing:
>
>  - the completely failed drive
>  - the empty drive that I overwrote with the second failed disks image
>
> Consequently I can't start the filesystem. I've discussed the issue in
> the past with Ke and other people on the #btrfs channel, the
> concensus; as I understood it, is that with the right patch it should
> be possible to mount either the array with the empty drive absent or
> to create a new btrfs fileystem on an empty drive and then manipulate
> its UUIDs so that it believes it's the missing UUID from the existing
> btrfs filesystem.
>
> Here's the info showing the current state of the filesystem:
>
> ubuntu@ubuntu:~$ sudo btrfs filesystem show
> warning, device 6 is missing
> warning devid 6 not found already
> warning devid 7 not found already
> Label: none  uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
> Total devices 7 FS bytes used 5.47TiB
> devid1 size 1.81TiB used 1.71TiB path /dev/sda3
> devid2 size 1.81TiB used 1.71TiB path /dev/sdb3
> devid3 size 1.82TiB used 1.72TiB path /dev/sdc1
> devid4 size 1.82TiB used 1.72TiB path /dev/sdd1
> devid5 size 2.73TiB used 2.62TiB path /dev/sde1
> *** Some devices missing
> btrfs-progs v4.0

The used kernel version might also give people some hints.

Also, you have not stated what raid type the fs is; likely not raid6,
but rather raid 1 or 10 or 5
btrfs filesystem usage  will report and show this.

If it is raid6, you could still fix the issue in theory. AFAIK there
are no patches to fix a dual error in case it is other raid type or
single. The only option is then to use   btrfs rescue   on the
umounted array and hope to copy as much as possible off the damaged fs
to other storage.

> ubuntu@ubuntu:~$ sudo mount -o degraded /dev/sda3 /mnt
> mount: wrong fs type, bad option, bad superblock on /dev/sda3,
>missing codepage or helper program, or other error
>
>In some cases useful info is found in syslog - try
>dmesg | tail or so.
> ubuntu@ubuntu:~$ dmesg
> [...]
> [  749.322385] BTRFS info (device sde1): allowing degraded mounts
> [  749.322404] BTRFS info (device sde1): disk space caching is enabled
> [  749.323571] BTRFS warning (device sde1): devid 6 uuid
> f41bcb72-e88a-432f-9961-01307ec291a9 is missing
> [  749.335543] BTRFS warning (device sde1): devid 7 uuid
> 17f8e02a-923e-4ac3-9db2-eb1b47c1a8db missing
> [  749.407802] BTRFS: bdev (null) errs: wr 81791613, rd 57814378,
> flush 0, corrupt 0, gen 0
> [  749.407808] BTRFS: bdev /dev/sde1 errs: wr 0, rd 5002, flush 0,
> corrupt 0, gen 0
> [  774.759717] BTRFS: too many missing devices, writeable mount is not allowed
> [  774.804053] BTRFS: open_ctree failed
>
> Thank you in advance for your help,
>
> -JohnF
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: csum errors in VirtualBox VDI files

2016-03-22 Thread Kai Krakow
Am Tue, 22 Mar 2016 21:07:35 +0100
schrieb Henk Slager :

> On Tue, Mar 22, 2016 at 9:03 AM, Kai Krakow 
> wrote:
> > Hello!
> >
> > Since one of the last kernel updates (I don't know which exactly),
> > I'm experiencing csum errors within VDI files when running
> > VirtualBox. A side effect of this is, as soon as dmesg shows these
> > errors, commands like "du" and "df" hang until reboot.
> >
> > I've now restored the file from backup but it happens over and over
> > again.
> >
> > On another machine I'm also seeing errors with big files in the
> > following scenario (apparently an older kernel, 4.1.x I afair):
> >
> > # ntfsclone --save /dev/md126p2 -o rescue.ntfs.img
> >^ big NTFS partition   ^ file on btrfs
> >
> > results in a write error and the file system goes read-only.
> >
> > Both systems have in common they are using btrfs on bcache with
> > compress=lzo,autodefrag,nossd,discard (mraid=1,draid=0 and
> > mraid=1,draid=single).  
> 
> autodefrag,nossd:  I am using these too on bcached btrfs, so far no
> issues (including kernel 4.5.0). I have been writing bigfiles (> 50G)
> without problems.
> compress=lzo: I have used it on bcached btrfs in the past, worked fine
> discard: I am not sure what this should do on top of /dev/bcacheX ; I
> don't know how this relates to the bcache discard; I currently have
> bcache setting:  Discard?  False
> 
> I am not saying that the last mount option is the direct cause of the
> problems you experience, it is just that I don't know its impact
> currently.

Well, maybe interesting... My backup drive doesn't use discard and
shows no such misbehavior. I try disabling it as soon as I'm having
some time elaborating on that.

I guess only the devs can tell how discard interacts with bcache, and
if it makes sense to enable it for this setup.

-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID Assembly with Missing Empty Drive

2016-03-22 Thread John Marrett
After further discussion in #btrfs:

I left out the raid level, it's raid1:

ubuntu@ubuntu:~$ sudo btrfs filesystem df /mnt
Data, RAID1: total=6.04TiB, used=5.46TiB
System, RAID1: total=32.00MiB, used=880.00KiB
Metadata, RAID1: total=14.00GiB, used=11.59GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

It is possible to mount the filesystem with -o recover,ro

It may be possible to comment out this check:

https://github.com/torvalds/linux/blob/master/fs/btrfs/super.c#L1770

And then to mount read/write, remove the failed drive, add a new
drive. If there are no more interesting suggestions forthcoming I will
try it, though to test I'll want to overlay the underlying devices and
then export them using iSCSI, AoE or NBD in order to avoid further
damage to my filesystem.

Unfortunately I don't have nearly enough disk space available to make
a complete copy of the data and rebuild the filesystem.

-JohnF

On Tue, Mar 22, 2016 at 5:18 PM, Henk Slager  wrote:
> On Tue, Mar 22, 2016 at 9:19 PM, John Marrett  wrote:
>> I recently had a drive failure in a file server running btrfs. The
>> failed drive was completely non-functional. I added a new drive to the
>
> I asume you did btrfs device add  ?
> Or did you do this withbtrfs replace  ?
>
>> filesystem successfully, when I attempted to remove the failed drive I
>> encountered an error. I discovered that I actually experienced a dual
>> drive failure, the second drive only exhibited as failed when btrfs
>> tried to write to the drives in the filesystem when I removed the
>> disk.
>>
>> I shut down the array and imaged the failed drive using GNU ddrescue,
>> I was able to recover all but a few kb from the drive. Unfortunately,
>> when I imaged the drive I overwrote the drive that I had successfully
>> added to the filesystem.
>>
>> This brings me to my current state, I now have two devices missing:
>>
>>  - the completely failed drive
>>  - the empty drive that I overwrote with the second failed disks image
>>
>> Consequently I can't start the filesystem. I've discussed the issue in
>> the past with Ke and other people on the #btrfs channel, the
>> concensus; as I understood it, is that with the right patch it should
>> be possible to mount either the array with the empty drive absent or
>> to create a new btrfs fileystem on an empty drive and then manipulate
>> its UUIDs so that it believes it's the missing UUID from the existing
>> btrfs filesystem.
>>
>> Here's the info showing the current state of the filesystem:
>>
>> ubuntu@ubuntu:~$ sudo btrfs filesystem show
>> warning, device 6 is missing
>> warning devid 6 not found already
>> warning devid 7 not found already
>> Label: none  uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
>> Total devices 7 FS bytes used 5.47TiB
>> devid1 size 1.81TiB used 1.71TiB path /dev/sda3
>> devid2 size 1.81TiB used 1.71TiB path /dev/sdb3
>> devid3 size 1.82TiB used 1.72TiB path /dev/sdc1
>> devid4 size 1.82TiB used 1.72TiB path /dev/sdd1
>> devid5 size 2.73TiB used 2.62TiB path /dev/sde1
>> *** Some devices missing
>> btrfs-progs v4.0
>
> The used kernel version might also give people some hints.
>
> Also, you have not stated what raid type the fs is; likely not raid6,
> but rather raid 1 or 10 or 5
> btrfs filesystem usage  will report and show this.
>
> If it is raid6, you could still fix the issue in theory. AFAIK there
> are no patches to fix a dual error in case it is other raid type or
> single. The only option is then to use   btrfs rescue   on the
> umounted array and hope to copy as much as possible off the damaged fs
> to other storage.
>
>> ubuntu@ubuntu:~$ sudo mount -o degraded /dev/sda3 /mnt
>> mount: wrong fs type, bad option, bad superblock on /dev/sda3,
>>missing codepage or helper program, or other error
>>
>>In some cases useful info is found in syslog - try
>>dmesg | tail or so.
>> ubuntu@ubuntu:~$ dmesg
>> [...]
>> [  749.322385] BTRFS info (device sde1): allowing degraded mounts
>> [  749.322404] BTRFS info (device sde1): disk space caching is enabled
>> [  749.323571] BTRFS warning (device sde1): devid 6 uuid
>> f41bcb72-e88a-432f-9961-01307ec291a9 is missing
>> [  749.335543] BTRFS warning (device sde1): devid 7 uuid
>> 17f8e02a-923e-4ac3-9db2-eb1b47c1a8db missing
>> [  749.407802] BTRFS: bdev (null) errs: wr 81791613, rd 57814378,
>> flush 0, corrupt 0, gen 0
>> [  749.407808] BTRFS: bdev /dev/sde1 errs: wr 0, rd 5002, flush 0,
>> corrupt 0, gen 0
>> [  774.759717] BTRFS: too many missing devices, writeable mount is not 
>> allowed
>> [  774.804053] BTRFS: open_ctree failed
>>
>> Thank you in advance for your help,
>>
>> -JohnF
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a messa

Re: RAID Assembly with Missing Empty Drive

2016-03-22 Thread John Marrett
Henk,

> I asume you did btrfs device add  ?
> Or did you do this withbtrfs replace  ?

Just realised I missed this question, sorry, I performed an add
followed by a (failed) delete.

-JohnF

>
>> filesystem successfully, when I attempted to remove the failed drive I
>> encountered an error. I discovered that I actually experienced a dual
>> drive failure, the second drive only exhibited as failed when btrfs
>> tried to write to the drives in the filesystem when I removed the
>> disk.
>>
>> I shut down the array and imaged the failed drive using GNU ddrescue,
>> I was able to recover all but a few kb from the drive. Unfortunately,
>> when I imaged the drive I overwrote the drive that I had successfully
>> added to the filesystem.
>>
>> This brings me to my current state, I now have two devices missing:
>>
>>  - the completely failed drive
>>  - the empty drive that I overwrote with the second failed disks image
>>
>> Consequently I can't start the filesystem. I've discussed the issue in
>> the past with Ke and other people on the #btrfs channel, the
>> concensus; as I understood it, is that with the right patch it should
>> be possible to mount either the array with the empty drive absent or
>> to create a new btrfs fileystem on an empty drive and then manipulate
>> its UUIDs so that it believes it's the missing UUID from the existing
>> btrfs filesystem.
>>
>> Here's the info showing the current state of the filesystem:
>>
>> ubuntu@ubuntu:~$ sudo btrfs filesystem show
>> warning, device 6 is missing
>> warning devid 6 not found already
>> warning devid 7 not found already
>> Label: none  uuid: 67b4821f-16e0-436d-b521-e4ab2c7d3ab7
>> Total devices 7 FS bytes used 5.47TiB
>> devid1 size 1.81TiB used 1.71TiB path /dev/sda3
>> devid2 size 1.81TiB used 1.71TiB path /dev/sdb3
>> devid3 size 1.82TiB used 1.72TiB path /dev/sdc1
>> devid4 size 1.82TiB used 1.72TiB path /dev/sdd1
>> devid5 size 2.73TiB used 2.62TiB path /dev/sde1
>> *** Some devices missing
>> btrfs-progs v4.0
>
> The used kernel version might also give people some hints.
>
> Also, you have not stated what raid type the fs is; likely not raid6,
> but rather raid 1 or 10 or 5
> btrfs filesystem usage  will report and show this.
>
> If it is raid6, you could still fix the issue in theory. AFAIK there
> are no patches to fix a dual error in case it is other raid type or
> single. The only option is then to use   btrfs rescue   on the
> umounted array and hope to copy as much as possible off the damaged fs
> to other storage.
>
>> ubuntu@ubuntu:~$ sudo mount -o degraded /dev/sda3 /mnt
>> mount: wrong fs type, bad option, bad superblock on /dev/sda3,
>>missing codepage or helper program, or other error
>>
>>In some cases useful info is found in syslog - try
>>dmesg | tail or so.
>> ubuntu@ubuntu:~$ dmesg
>> [...]
>> [  749.322385] BTRFS info (device sde1): allowing degraded mounts
>> [  749.322404] BTRFS info (device sde1): disk space caching is enabled
>> [  749.323571] BTRFS warning (device sde1): devid 6 uuid
>> f41bcb72-e88a-432f-9961-01307ec291a9 is missing
>> [  749.335543] BTRFS warning (device sde1): devid 7 uuid
>> 17f8e02a-923e-4ac3-9db2-eb1b47c1a8db missing
>> [  749.407802] BTRFS: bdev (null) errs: wr 81791613, rd 57814378,
>> flush 0, corrupt 0, gen 0
>> [  749.407808] BTRFS: bdev /dev/sde1 errs: wr 0, rd 5002, flush 0,
>> corrupt 0, gen 0
>> [  774.759717] BTRFS: too many missing devices, writeable mount is not 
>> allowed
>> [  774.804053] BTRFS: open_ctree failed
>>
>> Thank you in advance for your help,
>>
>> -JohnF
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] fstest: btrfs: test single 4k extent after subpagesize buffered writes

2016-03-22 Thread Liu Bo
On Tue, Mar 22, 2016 at 12:00:13PM +0800, Eryu Guan wrote:
> On Thu, Mar 17, 2016 at 03:56:38PM -0700, Liu Bo wrote:
> > This is to test if COW enabled btrfs can end up with single 4k extents
> > when doing subpagesize buffered writes.
> 
> What happens if btrfs is mounted with "nodatacow" option? Does it need
> to _notrun if cow is disabled?

In my test, the test passes if mounting with "nodatacow".
Yes, it makes sense to have a _notrun for nodatacow.

> 
> > 
> > The patch to fix the problem is
> >   https://patchwork.kernel.org/patch/8527991/
> > 
> > Signed-off-by: Liu Bo 
> > ---
> > v2: - Teach awk to know system's pagesize.
> > - Add "Silence is golden" to output.
> > - Use local variables to lower case.
> > - Add comments to make code clear.
> 
> This should be v3, and this patch was buried in the v2 thread :)

Oops, thanks for pointing it out.

> 
> > 
> >  tests/btrfs/027 | 102 
> > 
> >  tests/btrfs/027.out |   2 ++
> >  tests/btrfs/group   |   1 +
> >  3 files changed, 105 insertions(+)
> >  create mode 100755 tests/btrfs/027
> >  create mode 100644 tests/btrfs/027.out
> > 
> > diff --git a/tests/btrfs/027 b/tests/btrfs/027
> > new file mode 100755
> > index 000..19d324b
> > --- /dev/null
> > +++ b/tests/btrfs/027
> > @@ -0,0 +1,102 @@
> > +#! /bin/bash
> > +# FS QA Test 027
> > +#
> > +# When btrfs is using cow mode, buffered writes of sub-pagesize can end up 
> > with
> > +# single 4k extents.
> > +# Ref:
> > +# "Stray 4k extents with slow buffered writes"
> > +# https://www.spinics.net/lists/linux-btrfs/msg52628.html
> 
> After going through this thread, my understanding is that nodatacow
> btrfs should pass this test even on unpatched kernel (e.g. v4.5). But
> my test on v4.5 kernel failed with nodatacow mount option, pagesize
> extent is still found.
> 

I verified it again on my kvm box and it passed with a unpatched v4.5 kernel.

Can you please show me the 027.full file?

I can't think of a reason for this..

> > +#
> > +#---
> > +# Copyright (c) 2016 Liu Bo.  All Rights Reserved.
> > +#
> > +# This program is free software; you can redistribute it and/or
> > +# modify it under the terms of the GNU General Public License as
> > +# published by the Free Software Foundation.
> > +#
> > +# This program is distributed in the hope that it would be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with this program; if not, write the Free Software Foundation,
> > +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> > +#---
> > +#
> > +
> > +seq=`basename $0`
> > +seqres=$RESULT_DIR/$seq
> > +echo "QA output created by $seq"
> > +
> > +here=`pwd`
> > +tmp=/tmp/$$
> > +status=1   # failure is the default!
> > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > +
> > +_cleanup()
> > +{
> > +   cd /
> > +   rm -f $tmp.*
> > +
> > +   # restore expire
> > +   echo $default_expire > /proc/sys/vm/dirty_expire_centisecs
> > +}
> > +
> > +# get standard environment, filters and checks
> > +. ./common/rc
> > +. ./common/filter
> > +
> > +# remove previous $seqres.full before test
> > +rm -f $seqres.full
> > +echo "Silence is golden"
> > +
> > +# real QA test starts here
> > +
> > +# Modify as appropriate.
> > +_supported_fs btrfs
> > +_supported_os Linux
> > +_require_scratch
> > +_require_xfs_io_command "fiemap"
> > +
> > +_scratch_mkfs >> $seqres.full 2>&1
> > +_scratch_mount
> > +
> > +default_expire=`cat /proc/sys/vm/dirty_expire_centisecs`
> > +# Make it flush dirty pages more frequently to make sure we reproduce the 
> > bug.
> > +echo 50 > /proc/sys/vm/dirty_expire_centisecs
> > +
> > +tfile=$SCRATCH_MNT/testfile
> > +pagesize=$(get_page_size)
> > +sublen=$((RANDOM % pagesize))
> > +
> > +$XFS_IO_PROG -f -c "pwrite 0 $pagesize" $tfile > /dev/null 2>&1
> > +# write some subpagesize data first.
> > +$XFS_IO_PROG -c "pwrite $pagesize $sublen" $tfile > /dev/null 2>&1
> > +
> > +# Mix up "abnormal" subpagesize writes with normal pagesize based writes
> > +toff=$((pagesize + sublen))
> > +for ((i = 0; i < 1; i++))
> > +do
> > +   tlen=$pagesize
> > +   if [ $((i % 2)) = 0 ]; then
> > +   tlen=$((pagesize * 3))
> > +   fi
> > +   if [ $((i % 1000)) = 0 ]; then
> > +   tlen=$((RANDOM % pagesize))
> > +   fi
> > +
> > +   $XFS_IO_PROG -c "pwrite $toff $tlen" $tfile > /dev/null 2>&1
> > +   toff=$((toff + tlen))
> > +done
> 
> fstests prefers this format:
> 
> for ...; do
>   ...
> done

OK, thank you very much, Eryu!

Thanks,

-liubo

> 
> Thanks,
> Eryu
> 
> > +
> > +sync
> > +
> > +# check for single PAGESIZE extent
> >

Re: [PATCH] fstests: _fail the tests if _scratch_mount failed to avoid fully filling root fs

2016-03-22 Thread Dave Chinner
On Mon, Mar 21, 2016 at 03:23:41PM +0800, Eryu Guan wrote:
> btrfs failed to mount small fs on ppc64 host with error ENOSPC, even
> creating such small fs succeeded, then generic/027 consumed all free
> space on root fs not on SCRATCH_DEV and test harness cannot create tmp
> files and continue other tests.
> 
> Though I think it's a btrfs bug, it's still worth preventing this
> situation from happening in the harness, as such tests usually aim to
> exercise fs on ENOSPC conditions, there's no point to continue if the
> small fs is not mounted.

I think the btrfs bug should be fixed. At minimum, the workaround to
see if the filesytem can be mounted should be in btrfs's
implementation of scratch_mkfs_sized

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfsck memory usage reduce idea

2016-03-22 Thread Qu Wenruo



David Sterba wrote on 2016/03/22 15:49 +0100:

On Mon, Mar 21, 2016 at 10:15:55AM +0800, Qu Wenruo wrote:

IOW, there will be two options for the use to choose from, right? That's
what I'd expect. Be able to check the filesystem on a machine with less
memory at the cost of IO, but also do the faster check on a different
machine.


I was planning to use the new extent tree check to replace current one,
as a rework.
Am I always reworking things? :)


The problem with big reworks is that there are few people willing to
review them. So I'm not against doing such changes, especially in this
case it would be welcome, but I'm afraid that it could end up stalled
similar to the convert rewrite.


So for convert rework, unless some other developer reviews the patchset, 
it won't be merged, right?



To avoid the same problem, what about submitting small patchsets and 
replace extent tree fsck codes part by part?

(Although not sure if it's possible)

Reviewers would be much more happy reviewing 5 patches for 5 times, 
other than reviewing a big 25 patchset.


Thanks,
Qu




The point that I didn't want to keep the current behavior is, the old
one is just OK or OOM, no one would know if it will OOM until it happens.

But the new one would be much flex than current behavior.
As it fully uses the IO cache provided by kernel.


That's a good point, for the single implementation.





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: moving btrfs subvolumes to new disk

2016-03-22 Thread Ryan Erato
Finally got around to running the suggested commands. Same error with
the send, but not much output to help.  The check operation did seem
to reveal some potential issues. Here's the play-by-play along with
the file output from check:

[liveuser@localhost /]$ sudo btrfs check /dev/sda6 >
/home/liveuser/btrfscheck.txt
checking extents
checking free space cache
checking fs roots
root 257 inode 13324701 errors 200, dir isize wrong
root 258 inode 226392 errors 200, dir isize wrong
root 258 inode 236055 errors 2000, link count wrong
unresolved ref dir 226392 index 35 namelen 7 name LOG.old filetype 0
errors 3, no dir item, no dir index
root 258 inode 236273 errors 2000, link count wrong
root 258 inode 236276 errors 2000, link count wrong
unresolved ref dir 226392 index 39 namelen 15 name MANIFEST-15
filetype 0 errors 3, no dir item, no dir index
root 258 inode 236277 errors 2000, link count wrong
unresolved ref dir 226392 index 41 namelen 7 name CURRENT filetype 0
errors 3, no dir item, no dir index
root 258 inode 240618 errors 2000, link count wrong
unresolved ref dir 226392 index 115 namelen 10 name 89.log
filetype 0 errors 3, no dir item, no dir index
root 487 inode 13324701 errors 200, dir isize wrong
root 488 inode 226392 errors 200, dir isize wrong
root 488 inode 236055 errors 2000, link count wrong
unresolved ref dir 226392 index 35 namelen 7 name LOG.old filetype 0
errors 3, no dir item, no dir index
root 488 inode 236273 errors 2000, link count wrong
root 488 inode 236276 errors 2000, link count wrong
unresolved ref dir 226392 index 39 namelen 15 name MANIFEST-15
filetype 0 errors 3, no dir item, no dir index
root 488 inode 236277 errors 2000, link count wrong
unresolved ref dir 226392 index 41 namelen 7 name CURRENT filetype 0
errors 3, no dir item, no dir index
root 488 inode 240618 errors 2000, link count wrong
unresolved ref dir 226392 index 115 namelen 10 name 89.log
filetype 0 errors 3, no dir item, no dir index


" btrfscheck.txt
Checking filesystem on /dev/sda6
UUID: 6bb38bce-d824-4b9c-8b03-adad460c0f97
found 79333650514 bytes used err is 1
total csum bytes: 60906940
total tree bytes: 1530494976
total fs tree bytes: 1392934912
total extent tree bytes: 59506688
btree space waste bytes: 343045471
file data blocks allocated: 97137004544
 referenced 85008084992


[liveuser@localhost /]$ sudo mount /dev/sda6 /mnt/hdd

[liveuser@localhost /]$ sudo btrfs send -vvv --no-data -f homesnap.btr
/mnt/hdd/home/home.snap/
Mode NO_FILE_DATA enabled
At subvol /mnt/hdd/home/home.snap/
ERROR: send ioctl failed with -2: No such file or directory



On Sun, Mar 20, 2016 at 10:42 PM, Chris Murphy  wrote:
> On Sun, Mar 20, 2016 at 10:34 PM, Ryan Erato  wrote:
> .
>>
>> Sending "home.snap" to "/mnt/ssd" results in the -2 error. What is
>> peculiar, or possibly a red herring, is that it seems to fail at the
>> same point each time, at 4.39GB in to the transfer.
>
>
>
> That's pretty suspicious. I didn't realize from the first description
> that the command is doing something for a while before failing. I
> thought it was failing immediately.
>
> Try this:
>
> btrfs send -vvv --no-data -f homesnap.btr home.snapshot
>
> That will write out metadata only to a file, no receive. See if the
> error still happens and if the extra v gives more info.
>
> If it still fails with no more useful information then what I'd try
> next is a btrfs check with the most recent btrfs-progs you can find.
> If you're in need of a suggestion, this has btrfs-progs 4.4.1, I've
> tested that it boots, it's got a published sha256 hash, and is served
> over https. Yes, it's not even an alpha, but all you're doing is a
> check, not a --repair, and no need to mount it (although that's
> probably safe also, I've been doing it most of the weekend).
> https://dl.fedoraproject.org/pub/alt/stage/24_Alpha-1.6/Workstation/x86_64/iso/
> dd that iso file to a USB stick, it will destroy all data on the
> stick, and then boot the computer, and switch to tty2 (control-alt-f2)
> to get to a shell.
>
> I think 'btrfs check > btrfscheck.txt' will output most of the results
> to a text file. Often it misses the first few lines for whatever
> reason. You can either 'fpaste ' and then note the URL and
> post it here, or you can scp the file elsewhere, if you have wired
> ethernet connected.
>
>
> --
> Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


reflink copy 'invalid argument'

2016-03-22 Thread Chris Murphy
kernel is 4.5.0

The source is on one subvolume, destination on another subvolume, the
full path from top-level id5 is being used for the copy. I've not
previously seen this error before so I'm not sure what's going on. The
suspicious part to me in the strace is this line:

ioctl(4, BTRFS_IOC_CLONE, 0x3)  = -1 EINVAL (Invalid argument)

?

I don't think this is kernel version related, because I've been using
4.5.0 since release and I've done such reflink copies before. But I'm
not thinking how this is user error.


[root@f23m images]# mount | grep sda
/dev/sda5 on / type btrfs
(rw,noatime,seclabel,ssd,space_cache,subvolid=288,subvol=/f23w-root)
/dev/sda5 on /home type btrfs
(rw,noatime,seclabel,ssd,space_cache,subvolid=289,subvol=/home)
/dev/sda5 on /mnt type btrfs
(rw,relatime,seclabel,ssd,space_cache,subvolid=5,subvol=/)


[root@f23m images]# strace cp --reflink
/mnt/home/chris/Downloads/Fedora-Cloud-Base-24_Alpha-7.x86_64.qcow2
/mnt/f23w-root/var/lib/libvirt/images/
execve("/bin/cp", ["cp", "--reflink",
"/mnt/home/chris/Downloads/Fedora"...,
"/mnt/f23w-root/var/lib/libvirt/i"...], [/* 31 vars */]) = 0
brk(NULL)   = 0x5559e3d18000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x7f37bc5ce000
access("/etc/ld.so.preload", R_OK)  = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=88066, ...}) = 0
mmap(NULL, 88066, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f37bc5b8000
close(3)= 0
open("/lib64/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260b\0\0\0\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=136600, ...}) = 0
mmap(NULL, 2237248, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f37bc18c000
mprotect(0x7f37bc1ab000, 2097152, PROT_NONE) = 0
mmap(0x7f37bc3ab000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1f000) = 0x7f37bc3ab000
mmap(0x7f37bc3ad000, 4928, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f37bc3ad000
close(3)= 0
open("/lib64/libacl.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\200\37\0\0\0\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=35992, ...}) = 0
mmap(NULL, 2130048, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f37bbf83000
mprotect(0x7f37bbf8b000, 2093056, PROT_NONE) = 0
mmap(0x7f37bc18a000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x7000) = 0x7f37bc18a000
close(3)= 0
open("/lib64/libattr.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\23\0\0\0\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=23320, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x7f37bc5b7000
mmap(NULL, 2117648, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f37bbd7d000
mprotect(0x7f37bbd81000, 2097152, PROT_NONE) = 0
mmap(0x7f37bbf81000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4000) = 0x7f37bbf81000
mmap(0x7f37bbf82000, 16, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f37bbf82000
close(3)= 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\6\2\0\0\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2103656, ...}) = 0
mmap(NULL, 3934784, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f37bb9bc000
mprotect(0x7f37bbb73000, 2097152, PROT_NONE) = 0
mmap(0x7f37bbd73000, 24576, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b7000) = 0x7f37bbd73000
mmap(0x7f37bbd79000, 14912, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f37bbd79000
close(3)= 0
open("/lib64/libpcre.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360\26\0\0\0\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=459248, ...}) = 0
mmap(NULL, 2552072, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f37bb74c000
mprotect(0x7f37bb7bb000, 2093056, PROT_NONE) = 0
mmap(0x7f37bb9ba000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6e000) = 0x7f37bb9ba000
close(3)= 0
open("/lib64/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\16\0\0\0\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=19776, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x7f37bc5b6000
mmap(NULL, 2109712, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f37bb548000
mprotect(0x7f37bb54b000, 2093056, PROT_NONE) = 0
mmap(0x7f37bb74a000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3,

Re: reflink copy 'invalid argument'

2016-03-22 Thread Chris Murphy
OK OK, Mr. Short Term Memory here.

The target images/ directory has +C xattr. When removed the problem
doesn't happen. I've run into this before, and asked about it on the
list before. So yeah...


Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 00/27][For 4.7] Btrfs: Add inband (write time) de-duplication framework

2016-03-22 Thread Qu Wenruo

Thank you for your interest in dedupe patchset first.

In fact I'm quite afraid if there is no one interest in the patchset, it 
may be delayed again to 4.8.


David Sterba wrote on 2016/03/22 14:38 +0100:

On Tue, Mar 22, 2016 at 09:35:25AM +0800, Qu Wenruo wrote:

This updated version of inband de-duplication has the following features:
1) ONE unified dedup framework.
2) TWO different back-end with different trade-off


The on-disk format is defined in code, would be good to give some
overview here.


No problem at all.
(Although not sure if it's a good idea to explain it in mail. Maybe wiki 
is much better?)


There are 3 dedupe related on-disk items.

1) dedupe status
   Used by both dedupe backends. Mainly used to record the dedupe
   backend info, allowing btrfs to resume its dedupe setup after umount.

Key contents:
   Objectid , Type   , Offset
  (0, DEDUPE_STATUS_ITEM_KEY , 0  )

Structure contents:
  dedupe block size: records dedupe block size
  limit_nr:  In-memory hash limit
  hash_type: Only SHA256 is possible yet
  backend:   In-memory or on-disk

2) dedupe hash item
   The main item for on-disk dedupe backend.
   It's used for hash -> extent search.
   Duplicated hash won't be inserted into dedupe tree.

Key contents:
   Objectid, Type   , Offset
  (Last 64bit of hash  , DEDUPE_HASH_ITEM_KEY   , Bytenr of the extent)

Structure contents:
  len:   The in-memory length of the extent
 Should always match dedupe_bs.
  disk_len:  The on-disk length of extent, diffs with len
 if the extent is compressed.
  compression:   Compression algorithm.
  hash:  Complete hash(SHA256) of the extent, including
 the last  64 bit

  The structure is a simplified file extent with hash, offset are
  removed.

3) dedupe bytenr item
   Helper structure, mainly used for extent -> hash lookup, used by
   extent freeing.
   1 on 1 mapping with dedupe hash item.

Key contents:
   Objectid   , Type   , Offset
  (Extent bytenr  , DEDUPE_HASH_BYTENR_ITEM_KEY, Last 64 bit of hash)

Structure contents:
  Hash: Complete hash(SHA256) of the extent.




3) Support compression with dedupe
4) Ioctl interface with persist dedup status


I'd like to see the ioctl specified in more detail. So far there's
enable, disable and status. I'd expect some way to control the in-memory
limits, let it "forget" current hash cache, specify the dedupe chunk
size, maybe sync of the in-memory hash cache to disk.


So current and planned ioctl should be the following, with some details 
related to your in-memory limit control concerns.


1) Enable
   Enable dedupe if it's not enabled already. (disabled -> enabled)
   Or change current dedupe setting to another. (re-configure)

   For dedupe_bs/backend/hash algorithm(only SHA256 yet) change, it
   will disable dedupe(dropping all hash) and then enable with new
   setting.

   For in-memory backend, if only limit is different from previous
   setting, limit can be changed on the fly without dropping any hash.

2) Disable
   Disable will drop all hash and delete the dedupe tree if it exists.
   Imply a full sync_fs().

3) Status
   Output basic status of current dedupe.
   Including running status(disabled/enabled), dedupe block size, hash
   algorithm, and limit setting for in-memory backend.

4) (PLANNED) In-memory hash size querying
   Allowing userspace to query in-memory hash structure header size.
   Used for "btrfs dedupe enable" '-l' option to output warning if user
   specify memory size larger than 1/4 of the total memory.

5) (PLANNED) Dedeup rate statistics
   Should be handy for user to know the dedupe rate so they can further
   fine tuning their dedup setup.

So for your "in-memory limit control", just enable it with different limit.
For "dedupe block size change", just enable it with different dedupe_bs.
For "forget hash", just disable it.

And for "write in-memory hash onto disk", not planned and may never do 
it due to the complexity, sorry.





5) Ability to disable dedup for given dirs/files


This would be good to extend to subvolumes.


I'm sorry that I didn't quite understand the difference.
Doesn't dir includes subvolume?

Or xattr for subvolume is only restored in its parent subvolume, and 
won't be copied for its snapshot?





TODO:
1) Add extent-by-extent comparison for faster but more conflicting algorithm
Current SHA256 hash is quite slow, and for some old(5 years ago) CPU,
CPU may even be a bottleneck other than IO.
But for faster hash, it will definitely cause conflicts, so we need
extent comparison before we introduce new dedup algorithm.


If sha256 is slow, we can use a less secure hash that's faster but will
do a full byte-to-byte comparison in case of hash collision, 

Re: [PATCH] fstests: _fail the tests if _scratch_mount failed to avoid fully filling root fs

2016-03-22 Thread Eryu Guan
On Wed, Mar 23, 2016 at 11:08:56AM +1100, Dave Chinner wrote:
> On Mon, Mar 21, 2016 at 03:23:41PM +0800, Eryu Guan wrote:
> > btrfs failed to mount small fs on ppc64 host with error ENOSPC, even
> > creating such small fs succeeded, then generic/027 consumed all free
> > space on root fs not on SCRATCH_DEV and test harness cannot create tmp
> > files and continue other tests.
> > 
> > Though I think it's a btrfs bug, it's still worth preventing this
> > situation from happening in the harness, as such tests usually aim to
> > exercise fs on ENOSPC conditions, there's no point to continue if the
> > small fs is not mounted.
> 
> I think the btrfs bug should be fixed. At minimum, the workaround to
> see if the filesytem can be mounted should be in btrfs's
> implementation of scratch_mkfs_sized

OK, I'll add workaround in _scratch_mkfs_sized. Thanks for reviewing!

Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RAID-1 refuses to balance large drive

2016-03-22 Thread Brad Templeton
I have a RAID 1, and was running a bit low, so replaced a 2TB drive with
a 6TB.  The other drives are a 3TB and a 4TB.After switching the
drive, I did a balance and ... essentially nothing changed.  It did not
balance clusters over to the 6TB drive off of the other 2 drives.  I
found it odd, and wondered if it would do it as needed, but as time went
on, the filesys got full for real.

Making inquiries on the IRC channel, it was suggested perhaps the drives
were too full for a balance, but they had at least 50gb free I would
estimate, when I swapped.As a test, I added a 4th drive, a spare
20gb partition and did a balance.  The balance did indeed balance the 3
small drives, so they now each have 6gb unallocated, but the big drive
remained unchanged.   The balance reported it operated on almost all the
clusters, though.

Linux kernel 4.2.0 (Ubuntu Wiley)

Label: 'butter'  uuid: a91755d4-87d8-4acd-ae08-c11e7f1f5438
Total devices 4 FS bytes used 3.88TiB
devid1 size 3.62TiB used 3.62TiB path /dev/sdi2
devid2 size 2.73TiB used 2.72TiB path /dev/sdh
devid3 size 5.43TiB used 1.42TiB path /dev/sdg2
devid4 size 20.00GiB used 14.00GiB path /dev/sda1

btrfs fi usage /local

Overall:
Device size:  11.81TiB
Device allocated:  7.77TiB
Device unallocated:4.04TiB
Device missing:  0.00B
Used:  7.76TiB
Free (estimated):  2.02TiB  (min: 2.02TiB)
Data ratio:   2.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 0.00B)

Data,RAID1: Size:3.87TiB, Used:3.87TiB
   /dev/sda1  14.00GiB
   /dev/sdg2   1.41TiB
   /dev/sdh2.72TiB
   /dev/sdi2   3.61TiB

Metadata,RAID1: Size:11.00GiB, Used:9.79GiB
   /dev/sdg2   5.00GiB
   /dev/sdh7.00GiB
   /dev/sdi2  10.00GiB

System,RAID1: Size:32.00MiB, Used:572.00KiB
   /dev/sdg2  32.00MiB
   /dev/sdi2  32.00MiB

Unallocated:
   /dev/sda1   6.00GiB
   /dev/sdg2   4.02TiB
   /dev/sdh5.52GiB
   /dev/sdi2   7.36GiB

--
btrfs fi df /local
Data, RAID1: total=3.87TiB, used=3.87TiB
System, RAID1: total=32.00MiB, used=572.00KiB
Metadata, RAID1: total=11.00GiB, used=9.79GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

I would have presumed that a balance would take blocks found on both the
3TB and 4TB, and move one of them over to the 6TB until all had 1.3TB of
unallocated space.  But this does not happen.  Any clues on how to make
it happen?


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: moving btrfs subvolumes to new disk

2016-03-22 Thread Chris Murphy
On Tue, Mar 22, 2016 at 7:40 PM, Ryan Erato  wrote:
> Finally got around to running the suggested commands. Same error with
> the send, but not much output to help.  The check operation did seem
> to reveal some potential issues. Here's the play-by-play along with
> the file output from check:
>
> [liveuser@localhost /]$ sudo btrfs check /dev/sda6 >
> /home/liveuser/btrfscheck.txt
> checking extents
> checking free space cache
> checking fs roots
> root 257 inode 13324701 errors 200, dir isize wrong
> root 258 inode 226392 errors 200, dir isize wrong
> root 258 inode 236055 errors 2000, link count wrong
> unresolved ref dir 226392 index 35 namelen 7 name LOG.old filetype 0
> errors 3, no dir item, no dir index
> root 258 inode 236273 errors 2000, link count wrong
> root 258 inode 236276 errors 2000, link count wrong
> unresolved ref dir 226392 index 39 namelen 15 name MANIFEST-15
> filetype 0 errors 3, no dir item, no dir index
> root 258 inode 236277 errors 2000, link count wrong
> unresolved ref dir 226392 index 41 namelen 7 name CURRENT filetype 0
> errors 3, no dir item, no dir index
> root 258 inode 240618 errors 2000, link count wrong
> unresolved ref dir 226392 index 115 namelen 10 name 89.log
> filetype 0 errors 3, no dir item, no dir index
> root 487 inode 13324701 errors 200, dir isize wrong
> root 488 inode 226392 errors 200, dir isize wrong
> root 488 inode 236055 errors 2000, link count wrong
> unresolved ref dir 226392 index 35 namelen 7 name LOG.old filetype 0
> errors 3, no dir item, no dir index
> root 488 inode 236273 errors 2000, link count wrong
> root 488 inode 236276 errors 2000, link count wrong
> unresolved ref dir 226392 index 39 namelen 15 name MANIFEST-15
> filetype 0 errors 3, no dir item, no dir index
> root 488 inode 236277 errors 2000, link count wrong
> unresolved ref dir 226392 index 41 namelen 7 name CURRENT filetype 0
> errors 3, no dir item, no dir index
> root 488 inode 240618 errors 2000, link count wrong
> unresolved ref dir 226392 index 115 namelen 10 name 89.log
> filetype 0 errors 3, no dir item, no dir index

OK so now the question is if 'btrfs check --repair' can fix this, and
what version to use? 4.4.1 or 4.5.0? Based on the changelog, you can
probably use either version. And I think it should be safe. But, you
should still have backups seeing as you can mount the volume.


>
> [liveuser@localhost /]$ sudo mount /dev/sda6 /mnt/hdd
>
> [liveuser@localhost /]$ sudo btrfs send -vvv --no-data -f homesnap.btr
> /mnt/hdd/home/home.snap/
> Mode NO_FILE_DATA enabled
> At subvol /mnt/hdd/home/home.snap/
> ERROR: send ioctl failed with -2: No such file or directory

OK so there's something off with the metadata and it's not going to do
a send as a result is what this sounds like to me.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: Output more info for enospc_debug mount option

2016-03-22 Thread Qu Wenruo
As one user in mail list report reproducible balance ENOSPC error, it's
better to add more debug info for enospc_debug mount option.

Reported-by: Marc Haber 
Signed-off-by: Qu Wenruo 
---
changelog:
  v2: Add output for block group bytenr
---
 fs/btrfs/extent-tree.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 53e1297..8507484 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9386,15 +9386,23 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 
bytenr)
u64 dev_min = 1;
u64 dev_nr = 0;
u64 target;
+   int debug;
int index;
int full = 0;
int ret = 0;
 
+   debug = btrfs_test_opt(root, ENOSPC_DEBUG);
+
block_group = btrfs_lookup_block_group(root->fs_info, bytenr);
 
/* odd, couldn't find the block group, leave it alone */
-   if (!block_group)
+   if (!block_group) {
+   if (debug)
+   btrfs_warn(root->fs_info,
+  "can't find block group for bytenr %llu",
+  bytenr);
return -1;
+   }
 
min_free = btrfs_block_group_used(&block_group->item);
 
@@ -9448,8 +9456,13 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 
bytenr)
 * this is just a balance, so if we were marked as full
 * we know there is no space for a new chunk
 */
-   if (full)
+   if (full) {
+   if (debug)
+   btrfs_warn(root->fs_info,
+   "no space to alloc new chunk for block 
group %llu",
+   block_group->key.objectid);
goto out;
+   }
 
index = get_block_group_index(block_group);
}
@@ -9496,6 +9509,10 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 
bytenr)
ret = -1;
}
}
+   if (debug && ret == -1)
+   btrfs_warn(root->fs_info,
+   "no space to allocate a new chunk for block group %llu",
+   block_group->key.objectid);
mutex_unlock(&root->fs_info->chunk_mutex);
btrfs_end_transaction(trans, root);
 out:
-- 
2.7.4



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/12] xfs/030: fix output on newer filesystems

2016-03-22 Thread Dave Chinner
On Sat, Mar 05, 2016 at 12:20:50PM -0800, Christoph Hellwig wrote:
> Still fails for me:
> 
> --- tests/xfs/030.out 2016-03-03 07:55:58.556427678 +
> +++ /root/xfstests/results//xfs/030.out.bad   2016-03-05 20:20:17.561433837 
> +
> @@ -231,8 +231,6 @@
>  bad agbno AGBNO in agfl, agno 0
>  bad agbno AGBNO in agfl, agno 0
>  bad agbno AGBNO in agfl, agno 0
> -bad agbno AGBNO in agfl, agno 0
> -bad agbno AGBNO in agfl, agno 0

That's because the free lists are of different lengths on the
different fs configs. Not sure how best to handle it - maybe just
filter then entire bad agbno in agfl line out?

I'm going to commit the change anyway, as this is a separate issue
that needs to be solved.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID-1 refuses to balance large drive

2016-03-22 Thread Qu Wenruo



Brad Templeton wrote on 2016/03/22 17:47 -0700:

I have a RAID 1, and was running a bit low, so replaced a 2TB drive with
a 6TB.  The other drives are a 3TB and a 4TB.After switching the
drive, I did a balance and ... essentially nothing changed.  It did not
balance clusters over to the 6TB drive off of the other 2 drives.  I
found it odd, and wondered if it would do it as needed, but as time went
on, the filesys got full for real.


Did you resized the replaced deivces to max?
Without resize, btrfs still consider it can only use 2T of the 6T devices.

Thanks,
Qu



Making inquiries on the IRC channel, it was suggested perhaps the drives
were too full for a balance, but they had at least 50gb free I would
estimate, when I swapped.As a test, I added a 4th drive, a spare
20gb partition and did a balance.  The balance did indeed balance the 3
small drives, so they now each have 6gb unallocated, but the big drive
remained unchanged.   The balance reported it operated on almost all the
clusters, though.

Linux kernel 4.2.0 (Ubuntu Wiley)

Label: 'butter'  uuid: a91755d4-87d8-4acd-ae08-c11e7f1f5438
 Total devices 4 FS bytes used 3.88TiB
 devid1 size 3.62TiB used 3.62TiB path /dev/sdi2
 devid2 size 2.73TiB used 2.72TiB path /dev/sdh
 devid3 size 5.43TiB used 1.42TiB path /dev/sdg2
 devid4 size 20.00GiB used 14.00GiB path /dev/sda1

btrfs fi usage /local

Overall:
 Device size:  11.81TiB
 Device allocated:  7.77TiB
 Device unallocated:4.04TiB
 Device missing:  0.00B
 Used:  7.76TiB
 Free (estimated):  2.02TiB  (min: 2.02TiB)
 Data ratio:   2.00
 Metadata ratio:   2.00
 Global reserve:  512.00MiB  (used: 0.00B)

Data,RAID1: Size:3.87TiB, Used:3.87TiB
/dev/sda1  14.00GiB
/dev/sdg2   1.41TiB
/dev/sdh2.72TiB
/dev/sdi2   3.61TiB

Metadata,RAID1: Size:11.00GiB, Used:9.79GiB
/dev/sdg2   5.00GiB
/dev/sdh7.00GiB
/dev/sdi2  10.00GiB

System,RAID1: Size:32.00MiB, Used:572.00KiB
/dev/sdg2  32.00MiB
/dev/sdi2  32.00MiB

Unallocated:
/dev/sda1   6.00GiB
/dev/sdg2   4.02TiB
/dev/sdh5.52GiB
/dev/sdi2   7.36GiB

--
btrfs fi df /local
Data, RAID1: total=3.87TiB, used=3.87TiB
System, RAID1: total=32.00MiB, used=572.00KiB
Metadata, RAID1: total=11.00GiB, used=9.79GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

I would have presumed that a balance would take blocks found on both the
3TB and 4TB, and move one of them over to the 6TB until all had 1.3TB of
unallocated space.  But this does not happen.  Any clues on how to make
it happen?


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: csum errors in VirtualBox VDI files

2016-03-22 Thread Qu Wenruo



Kai Krakow wrote on 2016/03/22 19:48 +0100:

Am Tue, 22 Mar 2016 16:47:10 +0800
schrieb Qu Wenruo :


Hi,

Kai Krakow wrote on 2016/03/22 09:03 +0100:

Hello!

Since one of the last kernel updates (I don't know which exactly),
I'm experiencing csum errors within VDI files when running
VirtualBox. A side effect of this is, as soon as dmesg shows these
errors, commands like "du" and "df" hang until reboot.

I've now restored the file from backup but it happens over and over
again.

On another machine I'm also seeing errors with big files in the
following scenario (apparently an older kernel, 4.1.x I afair):

# ntfsclone --save /dev/md126p2 -o rescue.ntfs.img
 ^ big NTFS partition   ^ file on btrfs

results in a write error and the file system goes read-only.


When it goes RO, it must have some warning in kernel log.
Would you please paste the kernel log?


Apparently, that system does not boot now due to errors in bcache
b-tree. That being that, it may well be some bcache error and not
btrfs' fault. Apparently I couldn't catch the output, I've been in a
hurry. It said "write error" and had some backtrace. I will come to
this back later.

Let's go to the system I currently care about (that one with the
always breaking VDI file):


Both systems have in common they are using btrfs on bcache with
compress=lzo,autodefrag,nossd,discard (mraid=1,draid=0 and
mraid=1,draid=single).

The system mentioned first is running Kernel 4.5.0 with Gentoo
patch-set. I upgraded from the last 4.4.x kernel when I first
experienced this problem. The first time the problem resulted in a
duplicate extent which btrfsck wasn't able to fix, that's when I
first restored from backup. But now I'm getting csum errors in this
file over a over again, plus when rsync has run for backup, the
system no longer responds to "du" and "df" commands - it just hangs.

Known problem? Does it help if I send debug info? If so, please
instruct.


Does btrfs check report anything wrong?


After the error occured?

Yes, some text about the extent being compressed and btrfs repair
doesn't currently handle that case (I tried --repair as I'm having a
backup). I simply decided not to investigate that further at that point
but delete and restore the affected file from backup. However, this is
the message from dmesg (tho, I didn't catch the backtrace):

btrfs_run_delayed_refs:2927: errno=-17 Object already exists


That's nice, at least we have some clue.

It's almost sure, it's a bug either in btrfs kernel which doesn't handle 
delayed refs well(low possibility), or, corrupted fs which create 
something kernel can't handle(I bet that's the case).




After this, the system went RO and I had to reboot. I ran btrfs check
and it told about a duplicate extent.


If output of btrfsck can be posted, it would help a lot to locate the 
problem and enhance btrfsck.



I identified the file (using
btrfs inspect and the inode number) being the VDI file, and restored it.
Afterwards, I upgraded from latest 4.4 to 4.5. Currently, I'm now
watching closer since this incident, and the file becomes damaged
without any message in the kernel log when doing some more than usual
IO in VirtualBox. When my backup script then runs over the file, I get
errors about missing csums - the block is not readable.


If no other problem reported by btrfsck after your fix, --init-csum 
would handle such case.



I now ran
ddrescue, and replaced the file to get a current and slightly damaged
VDI image back (my backup uses time rotation, so no problem). But
running chkdsk in VirtualBox damages the VDI again.

Regarding the other error on the other machine, I'm not completely
convinced bcache ain't involved in this problem.

As soon as I "produced" csum errors again, I'll run btrfs check. Or
should I do it now without forcing the csum error to occur?



If it's possible, btrfsck now with all its output posted is recommended.

Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID-1 refuses to balance large drive

2016-03-22 Thread Brad Templeton
That's rather counter intuitive behaviour.  In most FSs, resizes are
needed when you do things like change the size of an underlying
partition, or you weren't using all the partition.  When you add one
drive with device add, and you then remove another with device delete,
why and how would the added device know to size itself to the device
that you are planning to delete?   Ie. I don't see how it could know
(you add the new drive before even telling it you want to remove the old
one) and I also can't see a reason it would not use all the drive you
tell it to add.

In any event, I did a btrfs fi resize 3:max /local on the 6TB as you
suggest, and have another balance running but it appears like all the
others to be doing nothing, though of course it will take hours.  Are
you sure it works that way?  Even before the resize, as you see below,
it indicates the volume is 6TB with 4TB of unallocated space.  It is
only the df that says full (and the fact that there is no unallocated
space on the 3TB and 4TB drives.)

On 03/22/2016 09:01 PM, Qu Wenruo wrote:
> 
> 
> Brad Templeton wrote on 2016/03/22 17:47 -0700:
>> I have a RAID 1, and was running a bit low, so replaced a 2TB drive with
>> a 6TB.  The other drives are a 3TB and a 4TB.After switching the
>> drive, I did a balance and ... essentially nothing changed.  It did not
>> balance clusters over to the 6TB drive off of the other 2 drives.  I
>> found it odd, and wondered if it would do it as needed, but as time went
>> on, the filesys got full for real.
> 
> Did you resized the replaced deivces to max?
> Without resize, btrfs still consider it can only use 2T of the 6T devices.
> 
> Thanks,
> Qu
> 
>>
>> Making inquiries on the IRC channel, it was suggested perhaps the drives
>> were too full for a balance, but they had at least 50gb free I would
>> estimate, when I swapped.As a test, I added a 4th drive, a spare
>> 20gb partition and did a balance.  The balance did indeed balance the 3
>> small drives, so they now each have 6gb unallocated, but the big drive
>> remained unchanged.   The balance reported it operated on almost all the
>> clusters, though.
>>
>> Linux kernel 4.2.0 (Ubuntu Wiley)
>>
>> Label: 'butter'  uuid: a91755d4-87d8-4acd-ae08-c11e7f1f5438
>>  Total devices 4 FS bytes used 3.88TiB
>>  devid1 size 3.62TiB used 3.62TiB path /dev/sdi2
>>  devid2 size 2.73TiB used 2.72TiB path /dev/sdh
>>  devid3 size 5.43TiB used 1.42TiB path /dev/sdg2
>>  devid4 size 20.00GiB used 14.00GiB path /dev/sda1
>>
>> btrfs fi usage /local
>>
>> Overall:
>>  Device size:  11.81TiB
>>  Device allocated:  7.77TiB
>>  Device unallocated:4.04TiB
>>  Device missing:  0.00B
>>  Used:  7.76TiB
>>  Free (estimated):  2.02TiB  (min: 2.02TiB)
>>  Data ratio:   2.00
>>  Metadata ratio:   2.00
>>  Global reserve:  512.00MiB  (used: 0.00B)
>>
>> Data,RAID1: Size:3.87TiB, Used:3.87TiB
>> /dev/sda1  14.00GiB
>> /dev/sdg2   1.41TiB
>> /dev/sdh2.72TiB
>> /dev/sdi2   3.61TiB
>>
>> Metadata,RAID1: Size:11.00GiB, Used:9.79GiB
>> /dev/sdg2   5.00GiB
>> /dev/sdh7.00GiB
>> /dev/sdi2  10.00GiB
>>
>> System,RAID1: Size:32.00MiB, Used:572.00KiB
>> /dev/sdg2  32.00MiB
>> /dev/sdi2  32.00MiB
>>
>> Unallocated:
>> /dev/sda1   6.00GiB
>> /dev/sdg2   4.02TiB
>> /dev/sdh5.52GiB
>> /dev/sdi2   7.36GiB
>>
>> --
>> btrfs fi df /local
>> Data, RAID1: total=3.87TiB, used=3.87TiB
>> System, RAID1: total=32.00MiB, used=572.00KiB
>> Metadata, RAID1: total=11.00GiB, used=9.79GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
>>
>> I would have presumed that a balance would take blocks found on both the
>> 3TB and 4TB, and move one of them over to the 6TB until all had 1.3TB of
>> unallocated space.  But this does not happen.  Any clues on how to make
>> it happen?
>>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID-1 refuses to balance large drive

2016-03-22 Thread Chris Murphy
On Tue, Mar 22, 2016 at 10:47 PM, Brad Templeton  wrote:
> That's rather counter intuitive behaviour.  In most FSs, resizes are
> needed when you do things like change the size of an underlying
> partition, or you weren't using all the partition.  When you add one
> drive with device add, and you then remove another with device delete,
> why and how would the added device know to size itself to the device
> that you are planning to delete?   Ie. I don't see how it could know
> (you add the new drive before even telling it you want to remove the old
> one) and I also can't see a reason it would not use all the drive you
> tell it to add.
>
> In any event, I did a btrfs fi resize 3:max /local on the 6TB as you
> suggest, and have another balance running but it appears like all the
> others to be doing nothing, though of course it will take hours.  Are
> you sure it works that way?  Even before the resize, as you see below,
> it indicates the volume is 6TB with 4TB of unallocated space.  It is
> only the df that says full (and the fact that there is no unallocated
> space on the 3TB and 4TB drives.)


It does work that way and I agree off hand that the lack of
automatically doing a resize to max is counter intuitive. I'd think
the user has implicitly set the size they want by handing over the
device to Btrfs, be it a whole device, partition or LV. There might be
some notes in the mail archive and possibly comments in btrfs-progs
that explains the logic.

devid1 size 3.62TiB used 3.62TiB path /dev/sdi2
devid2 size 2.73TiB used 2.72TiB path /dev/sdh
devid3 size 5.43TiB used 1.42TiB path /dev/sdg

Also note that after a successful balance this will not be evenly
allocated because device sizes aren't even. Simplistically it'll do
something like this: copy 1 chunks on devid3 and copy 2 chunks on
devid1 until the free space on devid1 is equal to free space on
devid2. And then it'll start alternating copy 2 chunks between devid1
and 2, while copy 1 chunks continue to write on devid3. That happens
until free space on all three is equal, and then allocation alternates
among all three to try to maintain approximately equal free space
remaining.

You might find this helpful:
http://carfax.org.uk/btrfs-usage/



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID-1 refuses to balance large drive

2016-03-22 Thread Chris Murphy
On Tue, Mar 22, 2016 at 11:54 PM, Brad Templeton  wrote:
> Actually, the URL suggests that all the space will be used, which is
> what I had read about btrfs, that it handled this.

It will. But it does this by dominating writes to the devices that
have the most free space, until all devices have the same free space.


> But again, how could it possibly know to restrict the new device to only
> using 2TB?

In your case, before resizing it, it's just inheriting the size from
the device being replaced.

>
> Stage one:  Add the new 6TB device.  The 2TB device is still present.
>
> Stage two:  Remove the 2TB device.

OK this is confusing. In your first post you said replaced. That
suggests you used 'btrfs replace start' rather than 'btrfs device add'
followed by 'btrfs device remove'. So which did you do?

If you did the latter, then there's no resize necessary.


> The system copies everything on it
> to the device which has the most space, the empty 6TB device.  But you
> are saying it decides to _shrink_ the 6TB device now that we know it is
> a 2TB device being removed?

No I'm not. The source of confusion appears to be that you're
unfamiliar with 'btrfs replace' so you mean 'dev add' followed by 'dev
remove' to mean replaced.

This line:
devid3 size 5.43TiB used 1.42TiB path /dev/sdg2

suggests it's using the entire 6TB of the newly added drive, it's
already at max size.


> We didn't know the 2TB would be removed
> when we added the 6TB, so I just can't fathom why the code would do
> that.  In addition, the stats I get back say it didn't do that.

I don't understand the first part. Whether you asked for 'dev remove'
or you used 'replace' both of those mean removing some device. You
have to specify the device to be removed.

Now might be a good time to actually write out the exact commands you've used.


>
> More to the point, after the resize, the balance is still not changing
> any size numbers.  It should be moving blocks to the most empty device,
> should it not?There is almost no space on devids 1 and 2, so it
> would not copy any chunks there.
>
> I'm starting to think this is a bug, but I'll keep plugging.

Could be a bug. Three drive raid1 of different sizes is somewhat
uncommon so it's possible it's hit an edge case somehow. Qu will know
more about how to find out why it's not allocating mostly to the
larger drive. The eventual work around might end up being to convert
data chunks to single, then convert back to raid1. But before doing
that it'd be better to find out why it's not doing the right thing the
normal way.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html