Re: [PATCH] btrfs-progs: mkfs: allow DUP on multidev fs, only warn

2016-03-24 Thread Hugo Mills
On Fri, Mar 25, 2016 at 01:59:20AM +, Duncan wrote:
> David Sterba posted on Thu, 24 Mar 2016 16:56:13 +0100 as excerpted:
> 
> > The DUP profile can work on multiple filesystems, the limitation is
> > rather artificial. Let the user make the decision and print a warning.
> > 
> > Signed-off-by: David Sterba 
> > ---
> 
> I like the change, but typo in the commit comment... What filesystems 
> other than btrfs allow DUP? =:^)  It's correct in the posting subject.

   Class/instance problem. :)

   Is that class of filesystems, btrfs, or instance of a btrfs
filesystem? "Filesystem" covers both cases.

   Hugo, Ontologist(*).

(*) Yes, that's actually my job title these days.

-- 
Hugo Mills | Be pure.
hugo@... carfax.org.uk | Be vigilant.
http://carfax.org.uk/  | Behave.
PGP: E2AB1DE4  |   Torquemada, Nemesis


signature.asc
Description: Digital signature


Re: [PATCH] fstests: add btrfs test for fsync after snapshot deletion

2016-03-24 Thread Eryu Guan
On Thu, Mar 24, 2016 at 05:07:21PM +, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> Test that if we delete a snapshot, delete its parent directory, create
> another directory with the same name as that parent and then fsync either
> the new directory or a file inside the new directory, the fsync succeeds,
> the fsync log is replayable and produces a correct result.
> 
> This is motivated by a bug that is fixed by the following patch for
> btrfs (linux kernel):
> 
>   Btrfs: fix unreplayable log after snapshot deletion and parent
>   re-creation

Test fails on v4.5 kernel as expected, but I failed to compile btrfs
after applying this patch, seems btrfs_must_commit_transaction was not
defined anywhere (I did grep it through the kernel tree, nothing showed
up), did I miss anything?

fs/btrfs/tree-log.c: In function ‘check_parent_dirs_for_sync’:
fs/btrfs/tree-log.c:4836:4: error: implicit declaration of function 
‘btrfs_must_commit_transaction’ [-Werror=implicit-function-declaration]
if (btrfs_must_commit_transaction(trans, inode))
^
cc1: some warnings being treated as errors
make[1]: *** [fs/btrfs/tree-log.o] Error 1
make: *** [_module_fs/btrfs] Error 2

> 
> Signed-off-by: Filipe Manana 
> ---
>  tests/btrfs/120 | 91 
> +
>  tests/btrfs/120.out | 12 +++
>  tests/btrfs/group   |  1 +
>  3 files changed, 104 insertions(+)
>  create mode 100755 tests/btrfs/120
>  create mode 100644 tests/btrfs/120.out
> 
> diff --git a/tests/btrfs/120 b/tests/btrfs/120
> new file mode 100755
> index 000..4ede4e6
> --- /dev/null
> +++ b/tests/btrfs/120
> @@ -0,0 +1,91 @@
> +#! /bin/bash
> +# FSQA Test No. 120
> +#
> +# Test that if we delete a snapshot, delete its parent directory, create
> +# another directory with the same name as that parent and then fsync either
> +# the new directory or a file inside the new directory, the fsync succeeds,
> +# the fsync log is replayable and produces a correct result.
> +#
> +#---
> +#
> +# Copyright (C) 2016 SUSE Linux Products GmbH. All Rights Reserved.
> +# Author: Filipe Manana 
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> + _cleanup_flakey
> + cd /
> + rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +. ./common/dmflakey
> +
> +# real QA test starts here
> +_supported_fs btrfs
> +_supported_os Linux
> +_require_scratch
> +_require_dm_target flakey
> +_require_metadata_journaling $SCRATCH_DEV
> +
> +rm -f $seqres.full
> +
> +_populate_fs()
> +{
> + _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT \
> + $SCRATCH_MNT/testdir/snap
> + _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/testdir/snap
> + rmdir $SCRATCH_MNT/testdir
> + mkdir $SCRATCH_MNT/testdir
> +}

There's already a _populate_fs function in common/rc, though this one
could override it, I think it's better to define a different function
name, e.g. without the leading underscore, as it's only used locally.

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH Resend] fstests: btrfs, test log replay with qgroups enabled and orphan roots

2016-03-24 Thread Eryu Guan
On Wed, Mar 23, 2016 at 02:49:15AM +, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> Test that replaying a log tree when qgroups are enabled and orphan roots
> (deleted snapshots) exist, the replay process does not crash.
> 
> This is motivated by a bug found in btrfs, introduced in the linux kernel
> 4.4 release, and is fixed by the linux kernel commit 909c3a22da3b
> ("Btrfs: fix loading of orphan roots leading to BUG_ON") that landed in
> kernel 4.5.
> 
> Signed-off-by: Filipe Manana 

Looks good to me, test passed on v4.5 kernel as expected.

Reviewed-by: Eryu Guan 

> ---
> 
> Resending as it was missing in the last git update.
> 
>  tests/btrfs/119 | 116 
> 
>  tests/btrfs/119.out |   9 
>  tests/btrfs/group   |   1 +
>  3 files changed, 126 insertions(+)
>  create mode 100755 tests/btrfs/119
>  create mode 100644 tests/btrfs/119.out
> 
> diff --git a/tests/btrfs/119 b/tests/btrfs/119
> new file mode 100755
> index 000..cf07550
> --- /dev/null
> +++ b/tests/btrfs/119
> @@ -0,0 +1,116 @@
> +#! /bin/bash
> +# FSQA Test No. 119
> +#
> +# Test log tree replay when qgroups are enabled and orphan roots (deleted
> +# snapshots) exist.
> +#
> +#---
> +#
> +# Copyright (C) 2016 SUSE Linux Products GmbH. All Rights Reserved.
> +# Author: Filipe Manana 
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> + _cleanup_flakey
> + cd /
> + rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +. ./common/dmflakey
> +
> +# real QA test starts here
> +_supported_fs btrfs
> +_supported_os Linux
> +_require_scratch
> +_require_dm_target flakey
> +_require_metadata_journaling $SCRATCH_DEV
> +
> +rm -f $seqres.full
> +
> +_scratch_mkfs >>$seqres.full 2>&1
> +_init_flakey
> +_mount_flakey
> +
> +_run_btrfs_util_prog quota enable $SCRATCH_MNT
> +
> +# Create 2 directories with one file in one of them.
> +# We use these just to trigger a transaction commit later, moving the file 
> from
> +# directory a to directory b and doing an fsync against directory a.
> +mkdir $SCRATCH_MNT/a
> +mkdir $SCRATCH_MNT/b
> +touch $SCRATCH_MNT/a/f
> +sync
> +
> +# Create our test file with 2 4K extents.
> +$XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 8K" $SCRATCH_MNT/foobar | 
> _filter_xfs_io
> +
> +# Create a snapshot and delete it. This doesn't really delete the snapshot
> +# immediately, just makes it inaccessible and invisible to user space, the
> +# snapshot is deleted later by a dedicated kernel thread (cleaner kthread)
> +# which is woke up at the next transaction commit.
> +# A root orphan item is inserted into the tree of tree roots, so that if a
> +# power failure happens before the dedicated kernel thread does the snapshot
> +# deletion, the next time the filesystem is mounted it resumes the snapshot
> +# deletion.
> +_run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap
> +_run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap
> +
> +# Now overwrite half of the extents we wrote before. Because we made a 
> snapshpot
> +# before, which isn't really deleted yet (since no transaction commit 
> happened
> +# after we did the snapshot delete request), the non overwritten extents get
> +# referenced twice, once by the default subvolume and once by the snapshot.
> +$XFS_IO_PROG -c "pwrite -S 0xbb 4K 8K" $SCRATCH_MNT/foobar | _filter_xfs_io
> +
> +# Now move file f from directory a to directory b and fsync directory a.
> +# The fsync on the directory a triggers a transaction commit (because a file
> +# was moved from it to another directory) and the file fsync leaves a log 
> tree
> +# with file extent items to replay.
> +mv $SCRATCH_MNT/a/f $SCRATCH_MNT/a/b
> +$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/a
> +$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar
> +
> +echo "File digest before power failure:"
> +md5sum $SCRATCH_MNT/foobar | _filter_scratch
> +
> +# Now simulate a power failure and mount the filesystem to replay the log 
> 

Re: [PATCH v8 10/27] btrfs: dedupe: Add basic tree structure for on-disk dedupe method

2016-03-24 Thread Qu Wenruo



Chris Mason wrote on 2016/03/24 16:58 -0400:

On Tue, Mar 22, 2016 at 09:35:35AM +0800, Qu Wenruo wrote:

Introduce a new tree, dedupe tree to record on-disk dedupe hash.
As a persist hash storage instead of in-memeory only implement.

Unlike Liu Bo's implement, in this version we won't do hack for
bytenr -> hash search, but add a new type, DEDUP_BYTENR_ITEM for such
search case, just like in-memory backend.


Thanks for refreshing this again, I'm starting to go through the disk
format in more detail.



Signed-off-by: Liu Bo 
Signed-off-by: Wang Xiaoguang 
Signed-off-by: Qu Wenruo 
---
  fs/btrfs/ctree.h | 63 +++-
  fs/btrfs/dedupe.h|  5 
  fs/btrfs/disk-io.c   |  1 +
  include/trace/events/btrfs.h |  3 ++-
  4 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 022ab61..bed9273 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -100,6 +100,9 @@ struct btrfs_ordered_sum;
  /* tracks free space in block groups. */
  #define BTRFS_FREE_SPACE_TREE_OBJECTID 10ULL

+/* on-disk dedupe tree (EXPERIMENTAL) */
+#define BTRFS_DEDUPE_TREE_OBJECTID 11ULL
+
  /* device stats in the device tree */
  #define BTRFS_DEV_STATS_OBJECTID 0ULL

@@ -508,6 +511,7 @@ struct btrfs_super_block {
   * ones specified below then we will fail to mount
   */
  #define BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE   (1ULL << 0)
+#define BTRFS_FEATURE_COMPAT_RO_DEDUPE (1ULL << 1)

  #define BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF  (1ULL << 0)
  #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL (1ULL << 1)
@@ -537,7 +541,8 @@ struct btrfs_super_block {
  #define BTRFS_FEATURE_COMPAT_SAFE_CLEAR   0ULL

  #define BTRFS_FEATURE_COMPAT_RO_SUPP  \
-   (BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE)
+   (BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |  \
+BTRFS_FEATURE_COMPAT_RO_DEDUPE)

  #define BTRFS_FEATURE_COMPAT_RO_SAFE_SET  0ULL
  #define BTRFS_FEATURE_COMPAT_RO_SAFE_CLEAR0ULL
@@ -959,6 +964,42 @@ struct btrfs_csum_item {
u8 csum;
  } __attribute__ ((__packed__));

+/*
+ * Objectid: 0
+ * Type: BTRFS_DEDUPE_STATUS_ITEM_KEY
+ * Offset: 0
+ */
+struct btrfs_dedupe_status_item {
+   __le64 blocksize;
+   __le64 limit_nr;
+   __le16 hash_type;
+   __le16 backend;
+} __attribute__ ((__packed__));
+
+/*
+ * Objectid: Last 64 bit of the hash
+ * Type: BTRFS_DEDUPE_HASH_ITEM_KEY
+ * Offset: Bytenr of the hash
+ *
+ * Used for hash <-> bytenr search
+ */
+struct btrfs_dedupe_hash_item {
+   /* length of dedupe range */
+   __le32 len;
+
+   /* Hash follows */
+} __attribute__ ((__packed__));


Are you storing the entire hash, or just the parts not represented in
the key?  I'd like to keep the on-disk part as compact as possible for
this part.


Currently, it's entire hash.

More detailed can be checked in another mail.
http://article.gmane.org/gmane.comp.file-systems.btrfs/54432

Although it's OK to truncate the last duplicated 8 bytes(64bit) for me,
I still quite like current implementation, as one memcpy() is simpler.




+
+/*
+ * Objectid: bytenr
+ * Type: BTRFS_DEDUPE_BYTENR_ITEM_KEY
+ * offset: Last 64 bit of the hash
+ *
+ * Used for bytenr <-> hash search (for free_extent)
+ * all its content is hash.
+ * So no special item struct is needed.
+ */
+


Can we do this instead with a backref from the extent?  It'll save us a
huge amount of IO as we delete things.


That's the original implementation from Liu Bo.

The problem is, it changes the data backref rules(originally, only 
EXTENT_DATA item can cause data backref), and will make dedupe INCOMPACT 
other than current RO_COMPACT.

So I really don't like to change the data backref rule.


If only want to reduce ondisk space, just trashing the hash and making 
DEDUPE_BYTENR_ITEM have no data would be good enough.


As (bytenr, DEDEUPE_BYTENR_ITEM) can locate the hash uniquely.

In fact no code really checked the hash for dedupe bytenr item, they all 
just swap objectid and offset, reset the type and do search for 
DEDUPE_HASH_ITEM.


So it's OK to emit the hash.

Thanks,
Qu



-chris





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: mkfs: allow DUP on multidev fs, only warn

2016-03-24 Thread Duncan
David Sterba posted on Thu, 24 Mar 2016 16:56:13 +0100 as excerpted:

> The DUP profile can work on multiple filesystems, the limitation is
> rather artificial. Let the user make the decision and print a warning.
> 
> Signed-off-by: David Sterba 
> ---

I like the change, but typo in the commit comment... What filesystems 
other than btrfs allow DUP? =:^)  It's correct in the posting subject.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: mkfs: fix an error when using DUP on multidev fs

2016-03-24 Thread Satoru Takeuchi
To accept DUP on multidev fs, in addition to the following
commit, we need to mark DUP as an allowed data/metadata
profile.

commit 42f1279bf8e9 ("btrfs-progs: mkfs: allow DUP on multidev fs, only warn")

* actual result

  =
  # ./mkfs.btrfs -f -m DUP -d DUP /dev/sdb1 /dev/sdb2
  btrfs-progs v4.5-24-ga35b7e6
  See http://btrfs.wiki.kernel.org for more information.

  WARNING: DUP is not recommended on filesystem with multiple devices
  ERROR: unable to create FS with metadata profile DUP (have 2 devices but 1 
devices are required)
  =

* expected result

  =
  # ./mkfs.btrfs -f -m dup -d dup /dev/sdb1 /dev/sdb2
  WARNING: DUP is not recommended on filesystem with multiple devices
  btrfs-progs v4.5-25-g1a10a3c
  See http://btrfs.wiki.kernel.org for more information.

  Label:  (null)
  UUID:   010d72ff-c87c-4516-8916-5e635719d110
  Node size:  16384
  Sector size:4096
  Filesystem size:28.87GiB
  Block group profiles:
Data: DUP   1.01GiB
Metadata: DUP   1.01GiB
System:   DUP  12.00MiB
  SSD detected:   no
  Incompat features:  extref, skinny-metadata
  Number of devices:  2
  Devices:
 IDSIZE  PATH
  1   953.00MiB  /dev/sdb1
  227.94GiB  /dev/sdb2

  ==

Signed-off-by: Satoru Takeuchi 
---
This patch can be applied to devel branch (commit: a35b7e6ee122)
---
 utils.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/utils.c b/utils.c
index 75ce6ea..7e45702 100644
--- a/utils.c
+++ b/utils.c
@@ -2455,7 +2455,6 @@ int test_num_disk_vs_raid(u64 metadata_profile, u64 
data_profile,
case 2:
allowed |= BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1 |
BTRFS_BLOCK_GROUP_RAID5;
-   break;
case 1:
allowed |= BTRFS_BLOCK_GROUP_DUP;
}
-- 
2.5.0
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 25/27] btrfs: dedupe: Add support for compression and dedpue

2016-03-24 Thread Qu Wenruo



Chris Mason wrote on 2016/03/24 16:35 -0400:

On Tue, Mar 22, 2016 at 09:35:50AM +0800, Qu Wenruo wrote:

From: Wang Xiaoguang 

The basic idea is also calculate hash before compression, and add needed
members for dedupe to record compressed file extent.

Since dedupe support dedupe_bs larger than 128K, which is the up limit
of compression file extent, in that case we will skip dedupe and prefer
compression, as in that size dedupe rate is low and compression will be
more obvious.

Current implement is far from elegant. The most elegant one should split
every data processing method into its own and independent function, and
have a unified function to co-operate them.


I'd leave this one out for now, it looks like we need to refine the
pipeline from dedup -> compression and this is just more to carry around
until the initial support is in.  Can you just decline to dedup
compressed extents for now?


Yes, completely no problem.
Although this patch seems works well yet, but I also have planned to 
rework current run_delloc_range() to make it more flex and clear.


So the main object of the patch is more about raising attention for such 
further re-work.


And now it has achieved its goal.

Thanks,
Qu


-chris





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 00/27][For 4.7] Btrfs: Add inband (write time) de-duplication framework

2016-03-24 Thread Qu Wenruo



David Sterba wrote on 2016/03/24 14:42 +0100:

On Wed, Mar 23, 2016 at 10:25:51AM +0800, Qu Wenruo wrote:

Thank you for your interest in dedupe patchset first.

In fact I'm quite afraid if there is no one interest in the patchset, it
may be delayed again to 4.8.


It's not about lack of interest, the high-profile features need time and
input from several people that will supposedly cover all aspects.


David Sterba wrote on 2016/03/22 14:38 +0100:
There are 3 dedupe related on-disk items.

1) dedupe status
 Used by both dedupe backends. Mainly used to record the dedupe
 backend info, allowing btrfs to resume its dedupe setup after umount.

Key contents:
 Objectid , Type   , Offset
(0, DEDUPE_STATUS_ITEM_KEY , 0  )


Please use the newly added BTRFS_PERSISTENT_ITEM_KEY instead of a new
key type. As this is the second user of that item, there's no precendent
how to select the subtype. Right now 0 is for the dev stats item, but
I'd like to leave some space between them, so it should be 256 at best.
The space is 64bit so there's enough room but this also means defining
the on-disk format.


After checking BTRFS_PERSISENT_ITEM_KEY, it seems that its value is 
larger than current DEDUPE_BYTENR/HASH_ITEM_KEY, and since the objectid 
of DEDUPE_HASH_ITEM_KEY, it won't be the first item of the tree.


Although that's not a big problem, but for user using debug-tree, it 
would be quite annoying to find it located among tons of other hashes.


So personally, if using PERSISTENT_ITEM_KEY, at least I prefer to keep 
objectid to 0, and modify DEDUPE_BYTENR/HASH_ITEM_KEY to higher value, 
to ensure dedupe status to be the first item of dedupe tree.





4) Ioctl interface with persist dedup status


I'd like to see the ioctl specified in more detail. So far there's
enable, disable and status. I'd expect some way to control the in-memory
limits, let it "forget" current hash cache, specify the dedupe chunk
size, maybe sync of the in-memory hash cache to disk.


So current and planned ioctl should be the following, with some details
related to your in-memory limit control concerns.

1) Enable
 Enable dedupe if it's not enabled already. (disabled -> enabled)


Ok, so it should also take a parameter which bckend is about to be
enabled.


It already has.
It also has limit_nr and limit_mem parameter for in-memory backend.




 Or change current dedupe setting to another. (re-configure)


Doing that in 'enable' sounds confusing, any changes belong to a
separate command.


This depends the aspect of view.

For "Enable/config/disable" case, it will introduce a state machine for 
end-user.


Personally, I doesn't state machine for end user. Yes, I also hate 
merging play and pause button together on music player.


If using state machine, user must ensure the dedupe is enabled before 
doing any configuration.


For me, user only need to care the result of the operation. User can now 
configure dedupe to their need without need to know previous setting.
From this aspect of view, "Enable/Disable" is much easier than 
"Enable/Config/Disable".





 For dedupe_bs/backend/hash algorithm(only SHA256 yet) change, it
 will disable dedupe(dropping all hash) and then enable with new
 setting.

 For in-memory backend, if only limit is different from previous
 setting, limit can be changed on the fly without dropping any hash.


This is obviously misplaced in 'enable'.


Then, changing the 'enable' to 'configure' or other proper naming would 
be better.


The point is, user only need to care what they want to do, not previous 
setup.





2) Disable
 Disable will drop all hash and delete the dedupe tree if it exists.
 Imply a full sync_fs().


That is again combining too many things into one. Say I want to disable
deduplication and want to enable it later. And not lose the whole state
between that. Not to say deleting the dedup tree.

IOW, deleting the tree belongs to a separate command, though in the
userspace tools it could be done in one command, but we're talking about
the kernel ioctls now.

I'm not sure if the sync is required, but it's acceptable for first
implementation.


The design is just to to reduce complexity.
If want to keep hash but disable dedupe, it will make dedupe only handle 
extent remove, but ignore any new coming write.


It will introduce a new state for dedupe, other than current simple 
enabled/disabled.

So I just don't allow such mode.





3) Status
 Output basic status of current dedupe.
 Including running status(disabled/enabled), dedupe block size, hash
 algorithm, and limit setting for in-memory backend.


Agreed. So this is basically the settings and static info.


4) (PLANNED) In-memory hash size querying
 Allowing userspace to query in-memory hash structure header size.
 Used for "btrfs dedupe enable" '-l' option to output warning if user
 specify memory size larger than 1/4 of 

Lockdep warning at device replace finishing

2016-03-24 Thread Yauhen Kharuzhy
Hi.

Can anybody point me to possible cause of this lockdep warning and say
if it is dangerous in reality?
It appeared when I started replacing from the missing drive ('btrfs replace
start   ). My locking-fu seems to be too weak
to resolve this by myself.

I use 4.4.5 kernel with Anand's global hotspare patches applied plus
patch 'Btrfs: fix lockdep deadlock warning due to dev_replace' was
applied also. This is reproduced rarely, so I cannot check this for sure
with latest kernels.


[ 1231.862717] BTRFS info (device sdf): dev_replace from /dev/sdf (devid 4) to 
/dev/sdh finished
[ 1231.881938] 
[ 1231.888624] ==
[ 1231.943857] [ INFO: possible circular locking dependency detected ]
[ 1231.953686] 4.4.5-scst31x+ #16 Not tainted
[ 1231.954755] ---
[ 1232.031044] btrfs/4453 is trying to acquire lock:
[ 1232.031966]  (sb_writers){.+.+.+}, at: [] 
__sb_start_write+0xb7/0xf0
[ 1232.033945] 
[ 1232.033945] but task is already holding lock:
[ 1232.035238]  (&fs_info->chunk_mutex){+.+.+.}, at: [] 
btrfs_dev_replace_finishing+0x121/0x9b0 [btrfs]
[ 1232.037577] 
[ 1232.037577] which lock already depends on the new lock.
[ 1232.037577] 
[ 1232.039352] 
[ 1232.039352] the existing dependency chain (in reverse order) is:
[ 1232.040817] 
-> #4 (&fs_info->chunk_mutex){+.+.+.}:
[ 1232.042429][] lock_acquire+0xce/0x1e0
[ 1232.090783][] mutex_lock_nested+0x69/0x3c0
[ 1232.093226][] 
btrfs_dev_replace_finishing+0x121/0x9b0 [btrfs]
[ 1232.094913][] btrfs_dev_replace_start+0x390/0x560 
[btrfs]
[ 1232.096798][] btrfs_ioctl+0x1e4c/0x2c00 [btrfs]
[ 1232.098396][] do_vfs_ioctl+0x2e2/0x530
[ 1232.099596][] SyS_ioctl+0x79/0x90
[ 1232.100725][] entry_SYSCALL_64_fastpath+0x16/0x7a
[ 1232.102031] 
-> #3 (&fs_devs->device_list_mutex){+.+.+.}:
[ 1232.103721][] lock_acquire+0xce/0x1e0
[ 1232.104888][] mutex_lock_nested+0x69/0x3c0
[ 1232.106106][] btrfs_show_devname+0x36/0x210 [btrfs]
[ 1232.107444][] show_vfsmnt+0x49/0x150
[ 1232.108603][] m_show+0x17/0x20
[ 1232.109696][] seq_read+0x2d8/0x3b0
[ 1232.60][] __vfs_read+0x28/0xd0
[ 1232.113660][] vfs_read+0x86/0x130
[ 1232.115847][] SyS_read+0x49/0xa0
[ 1232.116963][] entry_SYSCALL_64_fastpath+0x16/0x7a
[ 1232.118268] 
-> #2 (namespace_sem){+.}:
[ 1232.119795][] lock_acquire+0xce/0x1e0
[ 1232.125023][] down_write+0x49/0x80
[ 1232.150746][] lock_mount+0x43/0x1c0
[ 1232.151914][] do_add_mount+0x23/0xd0
[ 1232.157188][] do_mount+0x27b/0xe30
[ 1232.158416][] SyS_mount+0x8c/0xd0
[ 1232.159622][] entry_SYSCALL_64_fastpath+0x16/0x7a
[ 1232.189142] 
-> #1 (&sb->s_type->i_mutex_key#5){+.+.+.}:
[ 1232.192923][] lock_acquire+0xce/0x1e0
[ 1232.211173][] mutex_lock_nested+0x69/0x3c0
[ 1232.223543][] path_openat+0x468/0x1360
[ 1232.237187][] do_filp_open+0x7e/0xe0
[ 1232.239007][] do_sys_open+0x12b/0x210
[ 1232.241578][] SyS_open+0x1e/0x20
[ 1232.242711][] entry_SYSCALL_64_fastpath+0x16/0x7a
[ 1232.244018] 
-> #0 (sb_writers){.+.+.+}:
[ 1232.245504][] __lock_acquire+0x17ba/0x1ae0
[ 1232.246742][] lock_acquire+0xce/0x1e0
[ 1232.247934][] percpu_down_read+0x4f/0xa0
[ 1232.260955][] __sb_start_write+0xb7/0xf0
[ 1232.262264][] mnt_want_write+0x24/0x50
[ 1232.263460][] path_openat+0xd32/0x1360
[ 1232.264644][] do_filp_open+0x7e/0xe0
[ 1232.267248][] file_open_name+0xe4/0x130
[ 1232.268454][] filp_open+0x33/0x60
[ 1232.269584][] update_dev_time+0x16/0x40 [btrfs]
[ 1232.314209][] btrfs_scratch_superblocks+0x4e/0x90 
[btrfs]
[ 1232.315874][] 
btrfs_rm_dev_replace_remove_srcdev+0xa2/0xc0 [btrfs]
[ 1232.317610][] 
btrfs_dev_replace_finishing+0x490/0x9b0 [btrfs]
[ 1232.319291][] btrfs_dev_replace_start+0x390/0x560 
[btrfs]
[ 1232.325306][] btrfs_ioctl+0x1e4c/0x2c00 [btrfs]
[ 1232.326612][] do_vfs_ioctl+0x2e2/0x530
[ 1232.327808][] SyS_ioctl+0x79/0x90
[ 1232.328935][] entry_SYSCALL_64_fastpath+0x16/0x7a
[ 1232.330243] 
[ 1232.330243] other info that might help us debug this:
[ 1232.330243] 
[ 1232.414976] Chain exists of:
  sb_writers --> &fs_devs->device_list_mutex --> &fs_info->chunk_mutex

[ 1232.417570]  Possible unsafe locking scenario:
[ 1232.417570] 
[ 1232.423489]CPU0CPU1
[ 1232.424379]
[ 1232.425273]   lock(&fs_info->chunk_mutex);
[ 1232.426341]lock(&fs_devs->device_list_mutex);
[ 1232.427786]lock(&fs_info->chunk_mutex);
[ 1232.429158]   lock(sb_writers);
[ 1232.430097] 
[ 1232.430097]  *** DEADLOCK ***
[ 1232.430097] 
[ 1232.466399] 4 locks held by btrfs/4453:
[ 1232.467217]  #0:

Re: moving btrfs subvolumes to new disk

2016-03-24 Thread Chris Murphy
On Wed, Mar 23, 2016 at 8:08 PM, Ryan Erato  wrote:
> Success! Using the same ISO you previously linked to, I ran 'btrfs
> check --repair', did another check which actually revealed many new
> issues, ran a repair again and after that successive checks showed no
> signs of other issues. I was able to successfully send my 'home'
> subvolume to the SSD and wow, huge difference with otherwise little
> effort.
>
> Thanks to you and everybody else for helping me resolve this issue.

Yay, good news!


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs_destroy_inode WARN_ON.

2016-03-24 Thread Dave Jones
Just hit this on a tree from earlier this morning, v4.5-11140 or so.

WARNING: CPU: 2 PID: 32570 at fs/btrfs/inode.c:9261 
btrfs_destroy_inode+0x389/0x3f0 [btrfs]
CPU: 2 PID: 32570 Comm: rm Not tainted 4.5.0-think+ #14
 c039baf9 ef721ef0 88025966fc08 8957bcdb
   88025966fc50 890b41f1
 88045d918040 242d4eed6048 88024eed6048 88024eed6048
Call Trace:
 [] ? btrfs_destroy_inode+0x389/0x3f0 [btrfs]
 [] dump_stack+0x68/0x9d
 [] __warn+0x111/0x130
 [] warn_slowpath_null+0x1d/0x20
 [] btrfs_destroy_inode+0x389/0x3f0 [btrfs]
 [] destroy_inode+0x67/0x90
 [] evict+0x1b7/0x240
 [] iput+0x3ae/0x4e0
 [] ? dput+0x20e/0x460
 [] do_unlinkat+0x256/0x440
 [] ? do_rmdir+0x350/0x350
 [] ? syscall_trace_enter_phase1+0x87/0x260
 [] ? enter_from_user_mode+0x50/0x50
 [] ? __lock_is_held+0x25/0xd0
 [] ? mark_held_locks+0x22/0xc0
 [] ? syscall_trace_enter_phase2+0x12d/0x3d0
 [] ? SyS_rmdir+0x20/0x20
 [] SyS_unlinkat+0x1b/0x30
 [] do_syscall_64+0xf4/0x240
 [] entry_SYSCALL64_slow_path+0x25/0x25
---[ end trace a48ce4e6a1b5e409 ]---


That's WARN_ON(BTRFS_I(inode)->csum_bytes);

*maybe* it's a bad disk, but there's no indication in dmesg of anything awry.
Spinning rust on SATA, nothing special.

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 10/27] btrfs: dedupe: Add basic tree structure for on-disk dedupe method

2016-03-24 Thread Chris Mason
On Tue, Mar 22, 2016 at 09:35:35AM +0800, Qu Wenruo wrote:
> Introduce a new tree, dedupe tree to record on-disk dedupe hash.
> As a persist hash storage instead of in-memeory only implement.
> 
> Unlike Liu Bo's implement, in this version we won't do hack for
> bytenr -> hash search, but add a new type, DEDUP_BYTENR_ITEM for such
> search case, just like in-memory backend.

Thanks for refreshing this again, I'm starting to go through the disk
format in more detail.

> 
> Signed-off-by: Liu Bo 
> Signed-off-by: Wang Xiaoguang 
> Signed-off-by: Qu Wenruo 
> ---
>  fs/btrfs/ctree.h | 63 
> +++-
>  fs/btrfs/dedupe.h|  5 
>  fs/btrfs/disk-io.c   |  1 +
>  include/trace/events/btrfs.h |  3 ++-
>  4 files changed, 70 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 022ab61..bed9273 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -100,6 +100,9 @@ struct btrfs_ordered_sum;
>  /* tracks free space in block groups. */
>  #define BTRFS_FREE_SPACE_TREE_OBJECTID 10ULL
>  
> +/* on-disk dedupe tree (EXPERIMENTAL) */
> +#define BTRFS_DEDUPE_TREE_OBJECTID 11ULL
> +
>  /* device stats in the device tree */
>  #define BTRFS_DEV_STATS_OBJECTID 0ULL
>  
> @@ -508,6 +511,7 @@ struct btrfs_super_block {
>   * ones specified below then we will fail to mount
>   */
>  #define BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE  (1ULL << 0)
> +#define BTRFS_FEATURE_COMPAT_RO_DEDUPE   (1ULL << 1)
>  
>  #define BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF (1ULL << 0)
>  #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL(1ULL << 1)
> @@ -537,7 +541,8 @@ struct btrfs_super_block {
>  #define BTRFS_FEATURE_COMPAT_SAFE_CLEAR  0ULL
>  
>  #define BTRFS_FEATURE_COMPAT_RO_SUPP \
> - (BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE)
> + (BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |  \
> +  BTRFS_FEATURE_COMPAT_RO_DEDUPE)
>  
>  #define BTRFS_FEATURE_COMPAT_RO_SAFE_SET 0ULL
>  #define BTRFS_FEATURE_COMPAT_RO_SAFE_CLEAR   0ULL
> @@ -959,6 +964,42 @@ struct btrfs_csum_item {
>   u8 csum;
>  } __attribute__ ((__packed__));
>  
> +/*
> + * Objectid: 0
> + * Type: BTRFS_DEDUPE_STATUS_ITEM_KEY
> + * Offset: 0
> + */
> +struct btrfs_dedupe_status_item {
> + __le64 blocksize;
> + __le64 limit_nr;
> + __le16 hash_type;
> + __le16 backend;
> +} __attribute__ ((__packed__));
> +
> +/*
> + * Objectid: Last 64 bit of the hash
> + * Type: BTRFS_DEDUPE_HASH_ITEM_KEY
> + * Offset: Bytenr of the hash
> + *
> + * Used for hash <-> bytenr search
> + */
> +struct btrfs_dedupe_hash_item {
> + /* length of dedupe range */
> + __le32 len;
> +
> + /* Hash follows */
> +} __attribute__ ((__packed__));

Are you storing the entire hash, or just the parts not represented in
the key?  I'd like to keep the on-disk part as compact as possible for
this part.

> +
> +/*
> + * Objectid: bytenr
> + * Type: BTRFS_DEDUPE_BYTENR_ITEM_KEY
> + * offset: Last 64 bit of the hash
> + *
> + * Used for bytenr <-> hash search (for free_extent)
> + * all its content is hash.
> + * So no special item struct is needed.
> + */
> +

Can we do this instead with a backref from the extent?  It'll save us a
huge amount of IO as we delete things.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 25/27] btrfs: dedupe: Add support for compression and dedpue

2016-03-24 Thread Chris Mason
On Tue, Mar 22, 2016 at 09:35:50AM +0800, Qu Wenruo wrote:
> From: Wang Xiaoguang 
> 
> The basic idea is also calculate hash before compression, and add needed
> members for dedupe to record compressed file extent.
> 
> Since dedupe support dedupe_bs larger than 128K, which is the up limit
> of compression file extent, in that case we will skip dedupe and prefer
> compression, as in that size dedupe rate is low and compression will be
> more obvious.
> 
> Current implement is far from elegant. The most elegant one should split
> every data processing method into its own and independent function, and
> have a unified function to co-operate them.

I'd leave this one out for now, it looks like we need to refine the
pipeline from dedup -> compression and this is just more to carry around
until the initial support is in.  Can you just decline to dedup
compressed extents for now?

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Slow delete and shutdown

2016-03-24 Thread Pete
Hi.

I'm replacing a disk in my system.  I've added the new drive and I'm
deleting one which is causing data to be written to the new drive.
However, progress is painfully slow.  150GB of approx. 1.8TB written to
new disk in about 12 hours. SO about 6 days to go.  I know it is a slow
process but this seems excessive.

I want to shut down the system in this period, would that be OK?  Will
it resume on boot or would I just re-issue the delete command?

Pete


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fstests: add btrfs test for fsync after snapshot deletion

2016-03-24 Thread fdmanana
From: Filipe Manana 

Test that if we delete a snapshot, delete its parent directory, create
another directory with the same name as that parent and then fsync either
the new directory or a file inside the new directory, the fsync succeeds,
the fsync log is replayable and produces a correct result.

This is motivated by a bug that is fixed by the following patch for
btrfs (linux kernel):

  Btrfs: fix unreplayable log after snapshot deletion and parent
  re-creation

Signed-off-by: Filipe Manana 
---
 tests/btrfs/120 | 91 +
 tests/btrfs/120.out | 12 +++
 tests/btrfs/group   |  1 +
 3 files changed, 104 insertions(+)
 create mode 100755 tests/btrfs/120
 create mode 100644 tests/btrfs/120.out

diff --git a/tests/btrfs/120 b/tests/btrfs/120
new file mode 100755
index 000..4ede4e6
--- /dev/null
+++ b/tests/btrfs/120
@@ -0,0 +1,91 @@
+#! /bin/bash
+# FSQA Test No. 120
+#
+# Test that if we delete a snapshot, delete its parent directory, create
+# another directory with the same name as that parent and then fsync either
+# the new directory or a file inside the new directory, the fsync succeeds,
+# the fsync log is replayable and produces a correct result.
+#
+#---
+#
+# Copyright (C) 2016 SUSE Linux Products GmbH. All Rights Reserved.
+# Author: Filipe Manana 
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   _cleanup_flakey
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/dmflakey
+
+# real QA test starts here
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_dm_target flakey
+_require_metadata_journaling $SCRATCH_DEV
+
+rm -f $seqres.full
+
+_populate_fs()
+{
+   _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT \
+   $SCRATCH_MNT/testdir/snap
+   _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/testdir/snap
+   rmdir $SCRATCH_MNT/testdir
+   mkdir $SCRATCH_MNT/testdir
+}
+
+_scratch_mkfs >>$seqres.full 2>&1
+_init_flakey
+_mount_flakey
+
+mkdir $SCRATCH_MNT/testdir
+_populate_fs
+$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir
+_flakey_drop_and_remount
+
+echo "Filesystem contents after the first log replay:"
+ls -R $SCRATCH_MNT | _filter_scratch
+
+# Now do the same as before but instead of doing an fsync against the 
directory,
+# do an fsync against a file inside the directory.
+
+_populate_fs
+touch $SCRATCH_MNT/testdir/foobar
+$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir/foobar
+_flakey_drop_and_remount
+
+echo "Filesystem contents after the second log replay:"
+ls -R $SCRATCH_MNT | _filter_scratch
+
+_unmount_flakey
+status=0
+exit
diff --git a/tests/btrfs/120.out b/tests/btrfs/120.out
new file mode 100644
index 000..4210bfa
--- /dev/null
+++ b/tests/btrfs/120.out
@@ -0,0 +1,12 @@
+QA output created by 120
+Filesystem contents after the first log replay:
+SCRATCH_MNT:
+testdir
+
+SCRATCH_MNT/testdir:
+Filesystem contents after the second log replay:
+SCRATCH_MNT:
+testdir
+
+SCRATCH_MNT/testdir:
+foobar
diff --git a/tests/btrfs/group b/tests/btrfs/group
index d312874..13aa1e5 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -120,3 +120,4 @@
 117 auto quick send clone
 118 auto quick snapshot metadata
 119 auto quick snapshot metadata qgroup
+120 auto quick snapshot metadata
-- 
2.7.0.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix unreplayable log after snapshot deletion and parent re-creation

2016-03-24 Thread fdmanana
From: Filipe Manana 

If we delete a snapshot, delete its parent directory, create a new
directory with the same name as that parent and then fsync either that
new directory or some file inside it, we end up with a log tree that
is not possible to replay because the log replay procedure interprets
the snapshot's directory item as a regular entry and not as a root
item, resulting in the following failure and trace when mounting the
filesystem:

[52174.510532] BTRFS info (device dm-0): failed to delete reference to snap, 
inode 257 parent 257
[52174.512570] [ cut here ]
[52174.513278] WARNING: CPU: 12 PID: 28024 at fs/btrfs/inode.c:3986 
__btrfs_unlink_inode+0x178/0x351 [btrfs]()
[52174.514681] BTRFS: Transaction aborted (error -2)
[52174.515630] Modules linked in: btrfs dm_flakey dm_mod overlay crc32c_generic 
ppdev xor raid6_pq acpi_cpufreq parport_pc tpm_tis sg parport tpm evdev 
i2c_piix4 proc
[52174.521568] CPU: 12 PID: 28024 Comm: mount Tainted: GW   
4.5.0-rc6-btrfs-next-27+ #1
[52174.522805] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by 
qemu-project.org 04/01/2014
[52174.524053]   8801df2a7710 81264e93 
8801df2a7758
[52174.524053]  0009 8801df2a7748 81051618 
a03591cd
[52174.524053]  fffe 88015e6e5000 88016dbc3c88 
88016dbc3c88
[52174.524053] Call Trace:
[52174.524053]  [] dump_stack+0x67/0x90
[52174.524053]  [] warn_slowpath_common+0x99/0xb2
[52174.524053]  [] ? __btrfs_unlink_inode+0x178/0x351 [btrfs]
[52174.524053]  [] warn_slowpath_fmt+0x48/0x50
[52174.524053]  [] __btrfs_unlink_inode+0x178/0x351 [btrfs]
[52174.524053]  [] ? iput+0xb0/0x284
[52174.524053]  [] btrfs_unlink_inode+0x1c/0x3d [btrfs]
[52174.524053]  [] check_item_in_log+0x1fe/0x29b [btrfs]
[52174.524053]  [] replay_dir_deletes+0x167/0x1cf [btrfs]
[52174.524053]  [] fixup_inode_link_count+0x289/0x2aa [btrfs]
[52174.524053]  [] fixup_inode_link_counts+0xcb/0x105 [btrfs]
[52174.524053]  [] btrfs_recover_log_trees+0x258/0x32c [btrfs]
[52174.524053]  [] ? replay_one_extent+0x511/0x511 [btrfs]
[52174.524053]  [] open_ctree+0x1dd4/0x21b9 [btrfs]
[52174.524053]  [] btrfs_mount+0x97e/0xaed [btrfs]
[52174.524053]  [] ? trace_hardirqs_on+0xd/0xf
[52174.524053]  [] mount_fs+0x67/0x131
[52174.524053]  [] vfs_kern_mount+0x6c/0xde
[52174.524053]  [] btrfs_mount+0x1ac/0xaed [btrfs]
[52174.524053]  [] ? trace_hardirqs_on+0xd/0xf
[52174.524053]  [] ? lockdep_init_map+0xb9/0x1b3
[52174.524053]  [] mount_fs+0x67/0x131
[52174.524053]  [] vfs_kern_mount+0x6c/0xde
[52174.524053]  [] do_mount+0x8a6/0x9e8
[52174.524053]  [] ? strndup_user+0x3f/0x59
[52174.524053]  [] SyS_mount+0x77/0x9f
[52174.524053]  [] entry_SYSCALL_64_fastpath+0x12/0x6b
[52174.561288] ---[ end trace 6b53049efb1a3ea6 ]---

So when we delete a directory we need to propagate its last_unlink_trans
value (updated on snapshot deletion) to its parent and then check at
fsync time for it and fallback for a transaction commit.

A test case for fstests follows.

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo "QA output created by $seq"
  tmp=/tmp/$$
  status=1  # failure is the default!
  trap "_cleanup; exit \$status" 0 1 2 3 15

  _cleanup()
  {
  _cleanup_flakey
  cd /
  rm -f $tmp.*
  }

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter
  . ./common/dmflakey

  # real QA test starts here
  _supported_fs btrfs
  _supported_os Linux
  _require_scratch
  _require_dm_target flakey
  _require_metadata_journaling $SCRATCH_DEV

  rm -f $seqres.full

  _populate_fs()
  {
  _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT \
  $SCRATCH_MNT/testdir/snap
  _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/testdir/snap
  rmdir $SCRATCH_MNT/testdir
  mkdir $SCRATCH_MNT/testdir
  }

  _scratch_mkfs >>$seqres.full 2>&1
  _init_flakey
  _mount_flakey

  mkdir $SCRATCH_MNT/testdir
  _populate_fs
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir
  _flakey_drop_and_remount

  echo "Filesystem contents after the first log replay:"
  ls -R $SCRATCH_MNT | _filter_scratch

  # Now do the same as before but instead of doing an fsync against the 
directory,
  # do an fsync against a file inside the directory.

  _populate_fs
  touch $SCRATCH_MNT/testdir/foobar
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir/foobar
  _flakey_drop_and_remount

  echo "Filesystem contents after the second log replay:"
  ls -R $SCRATCH_MNT | _filter_scratch

  _unmount_flakey
  status=0
  exit

Signed-off-by: Filipe Manana 
---
 fs/btrfs/inode.c| 22 +-
 fs/btrfs/tree-log.c |  6 +-
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 41a5688..b5c23b5 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4173,6 +4173,7 @@ static int btrfs_rmdir(struct inode *dir, struct dentry 
*dentry)
int err = 0;
struct btrfs_root *root = BTR

Re: [PATCH 1/1] Btrfs: Code Cleanup

2016-03-24 Thread David Sterba
On Thu, Mar 24, 2016 at 04:08:18PM +0100, Petr Tesarik wrote:
> On Thu, 24 Mar 2016 16:03:07 +0100
> David Sterba  wrote:
> 
> > On Sun, Mar 20, 2016 at 03:11:11PM +0800, Flex Liu wrote:
> >[...]
> > > --- a/fs/btrfs/volumes.c
> > > +++ b/fs/btrfs/volumes.c
> > > @@ -2325,7 +2325,10 @@ int btrfs_init_new_device(struct btrfs_root *root, 
> > > char *device_path)
> > >   if (seeding_dev) {
> > >   sb->s_flags &= ~MS_RDONLY;
> > >   ret = btrfs_prepare_sprout(root);
> > > - BUG_ON(ret); /* -ENOMEM */
> > > + if (ret) {
> > > + btrfs_abort_transaction(trans, root, ret);
> > 
> > The transaction abort seems a bit heavy as it will take down the whole
> > filesystem. It's called from the device add ioctl, this is a restartable
> > operation.
> > 
> > Unfortunatelly btrfs_prepare_sprout is called after the transaction
> > start so btrfs_abort_transaction must be called. To avoid it, the code
> > would need to be reorganized, so the memory allocations happen in
> > advance.
> 
> On the other hand, an abort is still better than a BUG_ON(), and it may
> be easier to make incremental improvements.

That's acceptable, if there are going to be incremental improvements.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: mkfs: allow DUP on multidev fs, only warn

2016-03-24 Thread David Sterba
The DUP profile can work on multiple filesystems, the limitation is
rather artificial. Let the user make the decision and print a warning.

Signed-off-by: David Sterba 
---
 utils.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/utils.c b/utils.c
index eabc36dca7a1..9bc18d4508fc 100644
--- a/utils.c
+++ b/utils.c
@@ -2484,9 +2484,7 @@ int test_num_disk_vs_raid(u64 metadata_profile, u64 
data_profile,
 
if (dev_cnt > 1 &&
((metadata_profile | data_profile) & BTRFS_BLOCK_GROUP_DUP)) {
-   fprintf(stderr,
-   "ERROR: DUP is not allowed when FS has multiple devices\n");
-   return 1;
+   warning("DUP is not recommended on filesystem with multiple 
devices");
}
if (metadata_profile & ~allowed) {
fprintf(stderr,
-- 
2.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG when fsync'ing file in a overlayfs merged dir, located on btrfs

2016-03-24 Thread Al Viro
On Thu, Mar 24, 2016 at 03:20:43PM +, Al Viro wrote:

> > ocfs2_prepare_inode_for_write uses file->f_path.dentry for
> > should_remove_suid (due to needing to do it early since cluster locking
> > is unknown in setattr, according to the commit).  Having
> > should_remove_suid operate on an inode would solve that easily.
> 
> Can't do - there are filesystems that _need_ dentry for ->setattr().

Grr...  Sorry, misread what you'd written.  should_remove_suid()
ought to be switched to inode, indeed.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG when fsync'ing file in a overlayfs merged dir, located on btrfs

2016-03-24 Thread Jeff Mahoney
On 3/24/16 11:20 AM, Al Viro wrote:
> On Thu, Nov 05, 2015 at 11:03:58PM -0500, Jeff Mahoney wrote:
> 
>> I suppose the irony here is that, AFAIK, that code is to ensure a file
>> doesn't get lost between transactions due to rename.
>>
>> Isn't the file->f_path.dentry relationship stable otherwise, though? The
>> name might change and the parent might change but the dentry that the
>> file points to won't.
> 
> Sure, it is stable.  But that doesn't guarantee anything about the
> ancestors.
> 
> btrfs_log_inode_parent() is a mess.  Look at that loop you've got there:
> 
>while (1) {
> if (!parent || d_really_is_negative(parent) || sb != 
> d_inode(parent)->i_sb)
> break;
> 
> Really?  NULL from dget_parent()?  A negative from dget_parent()?  Sodding
> superblock changing from transition to parent?
> 
> inode = d_inode(parent);
> if (root != BTRFS_I(inode)->root)
> break;
> 
> Umm...  Something like crossing a snapshot boundary?
> 
> if (BTRFS_I(inode)->generation > last_committed) {  
> ret = btrfs_log_inode(trans, root, inode,
>   LOG_INODE_EXISTS,
>   0, LLONG_MAX, ctx);
> if (ret)
> goto end_trans;
> }
> if (IS_ROOT(parent))
> break;
> 
> parent = dget_parent(parent);
> dput(old_parent);
> old_parent = parent;
> }
> 
> Note that there's not a damn thing to guarantee that those directories have
> anything to do with your file - race with rename(2) easily could've sent you
> chasing the wrong chain of ancestors.  Incidentally, what will happen if
> you do that fsync() to an unlinked file?  It still has ->d_parent pointing
> to the what used to be its parent (quite possibly itself already rmdir'ed and
> pinned down by that reference; inode is still busy, as if we'd done open and
> rmdir).  Is that what those checks are attempting to catch?  And what happens
> if rmdir happens between the check and btrfs_log_inode()?
> 
> The thing is, playing with ancestors of opened file is very easy to get
> wrong, regardless of overlayfs involvement.  And this code is anything
> but clear.  I don't know btrfs guts enough to be able to tell how much
> can actually break (as opposed to adding wrong inodes to transaction),
> but I would really suggest taking a good look at what's going on there...

Yeah, absolutely.  The file_dentry() patch that you and Miklos worked
out the other day fixes the fsync crashes for us when we use it in
btrfs_file_sync but you're 100% correct in your opinion of this code.

>> I did find a few other places where that assumption happens without any
>> questionable traversals.  Sure, all three are in file systems unlikely
>> to be used with overlayfs.
>>
>> ocfs2_prepare_inode_for_write uses file->f_path.dentry for
>> should_remove_suid (due to needing to do it early since cluster locking
>> is unknown in setattr, according to the commit).  Having
>> should_remove_suid operate on an inode would solve that easily.
> 
> Can't do - there are filesystems that _need_ dentry for ->setattr().
>
> 
> Grr...  Sorry, misread what you'd written.  should_remove_suid()
> ought to be switched to inode, indeed.

Ok, I'll write that up.

>> cifs_new_fileinfo keeps a reference to the dentry but it seems to be
>> used mostly to access the inode except for the nasty-looking call to
>> build_path_from_dentry in cifs_reopen_file, which I won't be touching.
>> That does look like a questionable traversal, especially with the "we
>> can't take the rename lock here" comment.
> 
> cifs uses fs-root-relative pathnames in protocol; nothing to do there.
> Where do you see that comment, BTW?  It uses read_seqbegin/read_seqretry
> on rename_lock there...

I must've been looking at old code.  I don't see it now either.

-Jeff

-- 
Jeff Mahoney



signature.asc
Description: OpenPGP digital signature


Re: kernel BUG when fsync'ing file in a overlayfs merged dir, located on btrfs

2016-03-24 Thread Al Viro
On Thu, Nov 05, 2015 at 11:03:58PM -0500, Jeff Mahoney wrote:

> I suppose the irony here is that, AFAIK, that code is to ensure a file
> doesn't get lost between transactions due to rename.
> 
> Isn't the file->f_path.dentry relationship stable otherwise, though? The
> name might change and the parent might change but the dentry that the
> file points to won't.

Sure, it is stable.  But that doesn't guarantee anything about the
ancestors.

btrfs_log_inode_parent() is a mess.  Look at that loop you've got there:

   while (1) {
if (!parent || d_really_is_negative(parent) || sb != 
d_inode(parent)->i_sb)
break;

Really?  NULL from dget_parent()?  A negative from dget_parent()?  Sodding
superblock changing from transition to parent?

inode = d_inode(parent);
if (root != BTRFS_I(inode)->root)
break;

Umm...  Something like crossing a snapshot boundary?

if (BTRFS_I(inode)->generation > last_committed) {  
ret = btrfs_log_inode(trans, root, inode,
  LOG_INODE_EXISTS,
  0, LLONG_MAX, ctx);
if (ret)
goto end_trans;
}
if (IS_ROOT(parent))
break;

parent = dget_parent(parent);
dput(old_parent);
old_parent = parent;
}

Note that there's not a damn thing to guarantee that those directories have
anything to do with your file - race with rename(2) easily could've sent you
chasing the wrong chain of ancestors.  Incidentally, what will happen if
you do that fsync() to an unlinked file?  It still has ->d_parent pointing
to the what used to be its parent (quite possibly itself already rmdir'ed and
pinned down by that reference; inode is still busy, as if we'd done open and
rmdir).  Is that what those checks are attempting to catch?  And what happens
if rmdir happens between the check and btrfs_log_inode()?

The thing is, playing with ancestors of opened file is very easy to get
wrong, regardless of overlayfs involvement.  And this code is anything
but clear.  I don't know btrfs guts enough to be able to tell how much
can actually break (as opposed to adding wrong inodes to transaction),
but I would really suggest taking a good look at what's going on there...

> I did find a few other places where that assumption happens without any
> questionable traversals.  Sure, all three are in file systems unlikely
> to be used with overlayfs.
> 
> ocfs2_prepare_inode_for_write uses file->f_path.dentry for
> should_remove_suid (due to needing to do it early since cluster locking
> is unknown in setattr, according to the commit).  Having
> should_remove_suid operate on an inode would solve that easily.

Can't do - there are filesystems that _need_ dentry for ->setattr().

> cifs_new_fileinfo keeps a reference to the dentry but it seems to be
> used mostly to access the inode except for the nasty-looking call to
> build_path_from_dentry in cifs_reopen_file, which I won't be touching.
> That does look like a questionable traversal, especially with the "we
> can't take the rename lock here" comment.

cifs uses fs-root-relative pathnames in protocol; nothing to do there.
Where do you see that comment, BTW?  It uses read_seqbegin/read_seqretry
on rename_lock there...
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] Btrfs: Code Cleanup

2016-03-24 Thread Petr Tesarik
On Thu, 24 Mar 2016 16:03:07 +0100
David Sterba  wrote:

> On Sun, Mar 20, 2016 at 03:11:11PM +0800, Flex Liu wrote:
>[...]
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -2325,7 +2325,10 @@ int btrfs_init_new_device(struct btrfs_root *root, 
> > char *device_path)
> > if (seeding_dev) {
> > sb->s_flags &= ~MS_RDONLY;
> > ret = btrfs_prepare_sprout(root);
> > -   BUG_ON(ret); /* -ENOMEM */
> > +   if (ret) {
> > +   btrfs_abort_transaction(trans, root, ret);
> 
> The transaction abort seems a bit heavy as it will take down the whole
> filesystem. It's called from the device add ioctl, this is a restartable
> operation.
> 
> Unfortunatelly btrfs_prepare_sprout is called after the transaction
> start so btrfs_abort_transaction must be called. To avoid it, the code
> would need to be reorganized, so the memory allocations happen in
> advance.

On the other hand, an abort is still better than a BUG_ON(), and it may
be easier to make incremental improvements.

Just my 2 cents (I haven't tried it),
Petr T
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] Btrfs: Code Cleanup

2016-03-24 Thread David Sterba
On Sun, Mar 20, 2016 at 03:11:11PM +0800, Flex Liu wrote:
> From: Flex Liu 
> 
> In fs/btrfs/volumes.c:2328
> 
> if (seeding_dev) {
> sb->s_flags &= ~MS_RDONLY;
> ret = btrfs_prepare_sprout(root);
> BUG_ON(ret); /* -ENOMEM */
> }
> 
> the error code would be return from:
> 
> fs_devs = kzalloc(sizeof(*fs_devs), GFP_NOFS);
> if (!fs_devs)
> return ERR_PTR(-ENOMEM);

> Insufficient memory in btrfs would let the kernel panic, it suboptimal.
> instead, we should return the error code in btrfs_init_new_device to
> btrfs_ioctl.
> 
> Hello kernel list.
> This is my first patch for kernel, so if I missed some of the guidelines,
> please be patient :) I hope everything is explained in the commit message.

So a few comments:

- the subject line should say something like

  "handle errors in btrfs_init_new_device"
  or "replace BUG_ON with proper error handling",

  because it's not a code cleanup in the sense we commonly use

- you don't need to quote the code in the changelog, we can see it in
  the sources (unless you want to point out something very specific that
  might not be obvious)

> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2325,7 +2325,10 @@ int btrfs_init_new_device(struct btrfs_root *root, 
> char *device_path)
>   if (seeding_dev) {
>   sb->s_flags &= ~MS_RDONLY;
>   ret = btrfs_prepare_sprout(root);
> - BUG_ON(ret); /* -ENOMEM */
> + if (ret) {
> + btrfs_abort_transaction(trans, root, ret);

The transaction abort seems a bit heavy as it will take down the whole
filesystem. It's called from the device add ioctl, this is a restartable
operation.

Unfortunatelly btrfs_prepare_sprout is called after the transaction
start so btrfs_abort_transaction must be called. To avoid it, the code
would need to be reorganized, so the memory allocations happen in
advance.

Fixing the error handling might need more patches. btrfs_prepare_sprout
can be split into parts that do just allocations and initialization and
do not touch the filesystem state (like dropping the seeding flag).

The first part will hopefully cover all failure possibilities, before
the transaction stargs, the second shall not fail at all and the error
checking can be avoided completely.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: send: fix handling of multiple snapshots

2016-03-24 Thread David Sterba
On Thu, Mar 24, 2016 at 04:47:28PM +0900, Tsutomu Itoh wrote:
> We cannot send multiple snapshots at once.
> 
> [before fix]
> # btrfs send ./snap[12] > snap12.data
> At subvol ./snap1
> At subvol ./snap2
> ERROR: parent determination failed for 0
> #
> 
> [after fix]
> # btrfs send ./snap[12] > snap12.data
> At subvol ./snap1
> At subvol ./snap2
> #
> 
> Signed-off-by: Tsutomu Itoh 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 00/27][For 4.7] Btrfs: Add inband (write time) de-duplication framework

2016-03-24 Thread David Sterba
On Wed, Mar 23, 2016 at 10:25:51AM +0800, Qu Wenruo wrote:
> Thank you for your interest in dedupe patchset first.
> 
> In fact I'm quite afraid if there is no one interest in the patchset, it 
> may be delayed again to 4.8.

It's not about lack of interest, the high-profile features need time and
input from several people that will supposedly cover all aspects.

> David Sterba wrote on 2016/03/22 14:38 +0100:
> There are 3 dedupe related on-disk items.
> 
> 1) dedupe status
> Used by both dedupe backends. Mainly used to record the dedupe
> backend info, allowing btrfs to resume its dedupe setup after umount.
> 
> Key contents:
> Objectid , Type   , Offset
>(0, DEDUPE_STATUS_ITEM_KEY , 0  )

Please use the newly added BTRFS_PERSISTENT_ITEM_KEY instead of a new
key type. As this is the second user of that item, there's no precendent
how to select the subtype. Right now 0 is for the dev stats item, but
I'd like to leave some space between them, so it should be 256 at best.
The space is 64bit so there's enough room but this also means defining
the on-disk format.

> >> 4) Ioctl interface with persist dedup status
> >
> > I'd like to see the ioctl specified in more detail. So far there's
> > enable, disable and status. I'd expect some way to control the in-memory
> > limits, let it "forget" current hash cache, specify the dedupe chunk
> > size, maybe sync of the in-memory hash cache to disk.
> 
> So current and planned ioctl should be the following, with some details 
> related to your in-memory limit control concerns.
> 
> 1) Enable
> Enable dedupe if it's not enabled already. (disabled -> enabled)

Ok, so it should also take a parameter which bckend is about to be
enabled.

> Or change current dedupe setting to another. (re-configure)

Doing that in 'enable' sounds confusing, any changes belong to a
separate command.

> For dedupe_bs/backend/hash algorithm(only SHA256 yet) change, it
> will disable dedupe(dropping all hash) and then enable with new
> setting.
> 
> For in-memory backend, if only limit is different from previous
> setting, limit can be changed on the fly without dropping any hash.

This is obviously misplaced in 'enable'.

> 2) Disable
> Disable will drop all hash and delete the dedupe tree if it exists.
> Imply a full sync_fs().

That is again combining too many things into one. Say I want to disable
deduplication and want to enable it later. And not lose the whole state
between that. Not to say deleting the dedup tree.

IOW, deleting the tree belongs to a separate command, though in the
userspace tools it could be done in one command, but we're talking about
the kernel ioctls now.

I'm not sure if the sync is required, but it's acceptable for first
implementation.

> 
> 3) Status
> Output basic status of current dedupe.
> Including running status(disabled/enabled), dedupe block size, hash
> algorithm, and limit setting for in-memory backend.

Agreed. So this is basically the settings and static info.

> 4) (PLANNED) In-memory hash size querying
> Allowing userspace to query in-memory hash structure header size.
> Used for "btrfs dedupe enable" '-l' option to output warning if user
> specify memory size larger than 1/4 of the total memory.

And this reflects the run-time status. Ok.

> 5) (PLANNED) Dedeup rate statistics
> Should be handy for user to know the dedupe rate so they can further
> fine tuning their dedup setup.

Similar as above, but for a different type of data. Ok.

> So for your "in-memory limit control", just enable it with different limit.
> For "dedupe block size change", just enable it with different dedupe_bs.
> For "forget hash", just disable it.

I can comment once the semantics of 'enable' are split, but basically I
want an interface to control the deduplication cache.

> And for "write in-memory hash onto disk", not planned and may never do 
> it due to the complexity, sorry.

I'm not asking you to do it, definetelly not for the initial
implementation, but sync from memory to disk is IMO something that we
can expect users to ask for. The percieved complexity may shift
implementation to the future, but we should take it into account.

> >> 5) Ability to disable dedup for given dirs/files
> >
> > This would be good to extend to subvolumes.
> 
> I'm sorry that I didn't quite understand the difference.
> Doesn't dir includes subvolume?

If I enable deduplication on the entire subvolume, it will affect all
subdirectories. Not the other way around.

> Or xattr for subvolume is only restored in its parent subvolume, and 
> won't be copied for its snapshot?

The xattrs are copied to the snapshot.

> >> TODO:
> >> 1) Add extent-by-extent comparison for faster but more conflicting 
> >> algorithm
> >> Current SHA256 hash is quite slow, and for some old(5 years ago) CPU,
> >> CPU may even be a bottleneck other than IO.
> >> But for fa

Re: [RFC][PATCH] btrfs: allow balancing to dup with multi-device

2016-03-24 Thread David Sterba
On Wed, Mar 23, 2016 at 02:22:59PM -0400, Austin S. Hemmelgarn wrote:
> Currently, we don't allow the user to try and rebalance to a dup profile
> on a multi-device filesystem.  In most cases, this is a perfectly sensible
> restriction as raid1 uses the same amount of space and provides better
> protection.
> 
> However, when reshaping a multi-device filesystem down to a single device
> filesystem, this requires the user to convert metadata and system chunks
> to single profile before deleting devices, and then convert again to dup,
> which leaves a period of time where metadata integrity is reduced.
> 
> This patch removes the single-device-only restriction from converting to
> dup profile to remove this potential data integrity reduction.
> 
> Signed-off-by: Austin S. Hemmelgarn 

Ack. I did exactly same patch a week ago, the changelogs sound
equivalent to me, so I'll drop my patch and take yours.

http://repo.or.cz/linux-2.6/btrfs-unstable.git/commitdiff/d9929bfeec0abf24ae7fdbc5855914fe7f3886c9?hp=b562e44f507e863c6792946e4e1b1449fbbac85d

I did the btrfs-progs side as well, that prints a warning but otherwise
lets the user create dup on multi-dev filesystem.

http://repo.or.cz/btrfs-progs-unstable/devel.git/commitdiff/22c67c3105c887f541d7353d79d9e11b68e70998
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fstest: btrfs: remove _need_to_be_root fix btrfs/118

2016-03-24 Thread Anand Jain



On 03/24/2016 07:08 PM, Filipe Manana wrote:

On Thu, Mar 24, 2016 at 10:13 AM, Anand Jain  wrote:

commit 56ff01f471c9b72de0a447b37cdb1051adcede6a
 xfstests: remove _need_to_be_root

Removed _need_to_be_root(), and so btrfs/118 needs an update


https://git.kernel.org/cgit/fs/xfs/xfstests-dev.git/commit/?id=3a9242688d64c669ca764ee2c5599abfc885c131

It's always a good idea to check if one is using the latest git branch
and the mailing list... :)


Err. Yep. Sorry for the noise.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fstest: btrfs: update 048.out inline with ui changes

2016-03-24 Thread Anand Jain





On 03/24/2016 07:13 PM, Eryu Guan wrote:

On Thu, Mar 24, 2016 at 06:11:12PM +0800, Anand Jain wrote:

To be inline with progs changes
   28831d54895443e5fc795392f23ce3a8b122cb71
   btrfs-progs: cmd property: switch to common error message wrapper
update 048.out

Signed-off-by: Anand Jain 
---
  tests/btrfs/048.out | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/btrfs/048.out b/tests/btrfs/048.out
index 0b20d0b5976c..1d3ccef8e870 100644
--- a/tests/btrfs/048.out
+++ b/tests/btrfs/048.out
@@ -9,7 +9,7 @@ label=foobar
  ***
  label=
  ***
-ERROR: object is not compatible with property
+ERROR: object is not compatible with property: label


This fails test with old btrfs-progs. I think a filter can be added to
btrfs/048 to filter out the ": label" part, so that test passes with
both old and new btrfs-progs.


 Thanks.  Yep. I shall do that.



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fstest: btrfs: update 048.out inline with ui changes

2016-03-24 Thread Eryu Guan
On Thu, Mar 24, 2016 at 06:11:12PM +0800, Anand Jain wrote:
> To be inline with progs changes
>   28831d54895443e5fc795392f23ce3a8b122cb71
>   btrfs-progs: cmd property: switch to common error message wrapper
> update 048.out
> 
> Signed-off-by: Anand Jain 
> ---
>  tests/btrfs/048.out | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/btrfs/048.out b/tests/btrfs/048.out
> index 0b20d0b5976c..1d3ccef8e870 100644
> --- a/tests/btrfs/048.out
> +++ b/tests/btrfs/048.out
> @@ -9,7 +9,7 @@ label=foobar
>  ***
>  label=
>  ***
> -ERROR: object is not compatible with property
> +ERROR: object is not compatible with property: label

This fails test with old btrfs-progs. I think a filter can be added to
btrfs/048 to filter out the ": label" part, so that test passes with
both old and new btrfs-progs.

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fstest: btrfs: update 048.out inline with ui changes

2016-03-24 Thread Filipe Manana
On Thu, Mar 24, 2016 at 10:11 AM, Anand Jain  wrote:
> To be inline with progs changes
>   28831d54895443e5fc795392f23ce3a8b122cb71
>   btrfs-progs: cmd property: switch to common error message wrapper
> update 048.out
>
> Signed-off-by: Anand Jain 
> ---
>  tests/btrfs/048.out | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tests/btrfs/048.out b/tests/btrfs/048.out
> index 0b20d0b5976c..1d3ccef8e870 100644
> --- a/tests/btrfs/048.out
> +++ b/tests/btrfs/048.out
> @@ -9,7 +9,7 @@ label=foobar
>  ***
>  label=
>  ***
> -ERROR: object is not compatible with property
> +ERROR: object is not compatible with property: label

This now makes the test fail with older versions of btrfs-progs...

Same observation as before, update your git branch and read the mailing list:

https://git.kernel.org/cgit/fs/xfs/xfstests-dev.git/commit/?id=232593dea51c53f3251f9b41cd89e9dba20f0159
https://git.kernel.org/cgit/fs/xfs/xfstests-dev.git/commit/?id=2dd2cea4b51810bbb4090ebdc20ee799e63e5fd8

>  ***
>
>  Testing subvolume ro property
> --
> 2.7.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fstest: btrfs: remove _need_to_be_root fix btrfs/118

2016-03-24 Thread Filipe Manana
On Thu, Mar 24, 2016 at 10:13 AM, Anand Jain  wrote:
> commit 56ff01f471c9b72de0a447b37cdb1051adcede6a
> xfstests: remove _need_to_be_root
>
> Removed _need_to_be_root(), and so btrfs/118 needs an update

https://git.kernel.org/cgit/fs/xfs/xfstests-dev.git/commit/?id=3a9242688d64c669ca764ee2c5599abfc885c131

It's always a good idea to check if one is using the latest git branch
and the mailing list... :)

>
> Signed-off-by: Anand Jain 
> ---
>  tests/btrfs/118 | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/tests/btrfs/118 b/tests/btrfs/118
> index 3ed1cbef1068..740cb92b0669 100755
> --- a/tests/btrfs/118
> +++ b/tests/btrfs/118
> @@ -45,7 +45,6 @@ _cleanup()
>  . ./common/dmflakey
>
>  # real QA test starts here
> -_need_to_be_root
>  _supported_fs btrfs
>  _supported_os Linux
>  _require_scratch
> --
> 2.7.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fstest: btrfs: remove _need_to_be_root fix btrfs/118

2016-03-24 Thread Eryu Guan
On Thu, Mar 24, 2016 at 06:13:59PM +0800, Anand Jain wrote:
> commit 56ff01f471c9b72de0a447b37cdb1051adcede6a
> xfstests: remove _need_to_be_root
> 
> Removed _need_to_be_root(), and so btrfs/118 needs an update

It has been fixed by
3a92426 btrfs/118: remove call to _need_to_be_root

Perhaps you need to update your repository :)

Thanks,
Eryu

> 
> Signed-off-by: Anand Jain 
> ---
>  tests/btrfs/118 | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/tests/btrfs/118 b/tests/btrfs/118
> index 3ed1cbef1068..740cb92b0669 100755
> --- a/tests/btrfs/118
> +++ b/tests/btrfs/118
> @@ -45,7 +45,6 @@ _cleanup()
>  . ./common/dmflakey
>  
>  # real QA test starts here
> -_need_to_be_root
>  _supported_fs btrfs
>  _supported_os Linux
>  _require_scratch
> -- 
> 2.7.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs: keep sysfs target add in the last

2016-03-24 Thread Anand Jain
Sysfs create context should come in the last, so that we
don't have to undo sysfs operation for the reason that any
other operation has failed.

Signed-off-by: Anand Jain 
---
 fs/btrfs/dev-replace.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index d38cad37ba27..025f42ef5ab3 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -397,10 +397,6 @@ int btrfs_dev_replace_start(struct btrfs_root *root,
args->result = BTRFS_IOCTL_DEV_REPLACE_RESULT_NO_ERROR;
btrfs_dev_replace_unlock(dev_replace, 1);
 
-   ret = btrfs_sysfs_add_device_link(tgt_device->fs_devices, tgt_device);
-   if (ret)
-   btrfs_err(fs_info, "kobj add dev failed %d\n", ret);
-
btrfs_wait_ordered_roots(fs_info, -1);
 
/* force writing the updated state information to disk */
@@ -428,6 +424,9 @@ int btrfs_dev_replace_start(struct btrfs_root *root,
WARN_ON(ret);
}
 
+   if (btrfs_sysfs_add_device_link(tgt_device->fs_devices, tgt_device))
+   btrfs_err(fs_info, "kobj add dev failed during replace\n");
+
return ret;
 
 leave:
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs: refactor btrfs_dev_replace_start for reuse

2016-03-24 Thread Anand Jain
A refactor patch, and avoids user input verification in the
btrfs_dev_replace_start(), and so this function can be reused.

Signed-off-by: Anand Jain 
---
 fs/btrfs/dev-replace.c | 62 ++
 fs/btrfs/dev-replace.h |  4 +++-
 fs/btrfs/ioctl.c   |  2 +-
 3 files changed, 41 insertions(+), 27 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 025f42ef5ab3..2b926867d136 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -305,8 +305,8 @@ void btrfs_after_dev_replace_commit(struct btrfs_fs_info 
*fs_info)
dev_replace->cursor_left_last_write_of_item;
 }
 
-int btrfs_dev_replace_start(struct btrfs_root *root,
-   struct btrfs_ioctl_dev_replace_args *args)
+int btrfs_dev_replace_start(struct btrfs_root *root, char *tgtdev_name,
+   u64 srcdevid, char *srcdev_name, int read_src)
 {
struct btrfs_trans_handle *trans;
struct btrfs_fs_info *fs_info = root->fs_info;
@@ -315,29 +315,16 @@ int btrfs_dev_replace_start(struct btrfs_root *root,
struct btrfs_device *tgt_device = NULL;
struct btrfs_device *src_device = NULL;
 
-   switch (args->start.cont_reading_from_srcdev_mode) {
-   case BTRFS_IOCTL_DEV_REPLACE_CONT_READING_FROM_SRCDEV_MODE_ALWAYS:
-   case BTRFS_IOCTL_DEV_REPLACE_CONT_READING_FROM_SRCDEV_MODE_AVOID:
-   break;
-   default:
-   return -EINVAL;
-   }
-
-   if ((args->start.srcdevid == 0 && args->start.srcdev_name[0] == '\0') ||
-   args->start.tgtdev_name[0] == '\0')
-   return -EINVAL;
-
/* the disk copy procedure reuses the scrub code */
mutex_lock(&fs_info->volume_mutex);
-   ret = btrfs_dev_replace_find_srcdev(root, args->start.srcdevid,
-   args->start.srcdev_name,
-   &src_device);
+   ret = btrfs_dev_replace_find_srcdev(root, srcdevid,
+   srcdev_name, &src_device);
if (ret) {
mutex_unlock(&fs_info->volume_mutex);
return ret;
}
 
-   ret = btrfs_init_dev_replace_tgtdev(root, args->start.tgtdev_name,
+   ret = btrfs_init_dev_replace_tgtdev(root, tgtdev_name,
src_device, &tgt_device);
mutex_unlock(&fs_info->volume_mutex);
if (ret)
@@ -364,12 +351,11 @@ int btrfs_dev_replace_start(struct btrfs_root *root,
break;
case BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED:
case BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED:
-   args->result = BTRFS_IOCTL_DEV_REPLACE_RESULT_ALREADY_STARTED;
+   ret = BTRFS_IOCTL_DEV_REPLACE_RESULT_ALREADY_STARTED;
goto leave;
}
 
-   dev_replace->cont_reading_from_srcdev_mode =
-   args->start.cont_reading_from_srcdev_mode;
+   dev_replace->cont_reading_from_srcdev_mode = read_src;
WARN_ON(!src_device);
dev_replace->srcdev = src_device;
WARN_ON(!tgt_device);
@@ -394,7 +380,6 @@ int btrfs_dev_replace_start(struct btrfs_root *root,
dev_replace->cursor_right = 0;
dev_replace->is_valid = 1;
dev_replace->item_needs_writeback = 1;
-   args->result = BTRFS_IOCTL_DEV_REPLACE_RESULT_NO_ERROR;
btrfs_dev_replace_unlock(dev_replace, 1);
 
btrfs_wait_ordered_roots(fs_info, -1);
@@ -416,10 +401,8 @@ int btrfs_dev_replace_start(struct btrfs_root *root,
  &dev_replace->scrub_progress, 0, 1);
 
ret = btrfs_dev_replace_finishing(fs_info, ret);
-   /* don't warn if EINPROGRESS, someone else might be running scrub */
if (ret == -EINPROGRESS) {
-   args->result = BTRFS_IOCTL_DEV_REPLACE_RESULT_SCRUB_INPROGRESS;
-   ret = 0;
+   ret = BTRFS_IOCTL_DEV_REPLACE_RESULT_SCRUB_INPROGRESS;
} else {
WARN_ON(ret);
}
@@ -437,6 +420,35 @@ leave:
return ret;
 }
 
+int btrfs_dev_replace_by_ioctl(struct btrfs_root *root,
+   struct btrfs_ioctl_dev_replace_args *args)
+{
+   int ret;
+
+   switch (args->start.cont_reading_from_srcdev_mode) {
+   case BTRFS_IOCTL_DEV_REPLACE_CONT_READING_FROM_SRCDEV_MODE_ALWAYS:
+   case BTRFS_IOCTL_DEV_REPLACE_CONT_READING_FROM_SRCDEV_MODE_AVOID:
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   if ((args->start.srcdevid == 0 && args->start.srcdev_name[0] == '\0') ||
+   args->start.tgtdev_name[0] == '\0')
+   return -EINVAL;
+
+   ret = btrfs_dev_replace_start(root, args->start.tgtdev_name,
+   args->start.srcdevid,
+   args->start.srcdev_name,
+   
args->start.cont_reading_from_srcdev_m

[PATCH 1/3] btrfs: use fs_info directly

2016-03-24 Thread Anand Jain
Local variable fs_info, contains root->fs_info, use it.

Signed-off-by: Anand Jain 
---
 fs/btrfs/dev-replace.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index a1d6652e0c47..d38cad37ba27 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -375,7 +375,7 @@ int btrfs_dev_replace_start(struct btrfs_root *root,
WARN_ON(!tgt_device);
dev_replace->tgtdev = tgt_device;
 
-   btrfs_info_in_rcu(root->fs_info,
+   btrfs_info_in_rcu(fs_info,
  "dev_replace from %s (devid %llu) to %s started",
  src_device->missing ? "" :
rcu_str_deref(src_device->name),
@@ -399,9 +399,9 @@ int btrfs_dev_replace_start(struct btrfs_root *root,
 
ret = btrfs_sysfs_add_device_link(tgt_device->fs_devices, tgt_device);
if (ret)
-   btrfs_err(root->fs_info, "kobj add dev failed %d\n", ret);
+   btrfs_err(fs_info, "kobj add dev failed %d\n", ret);
 
-   btrfs_wait_ordered_roots(root->fs_info, -1);
+   btrfs_wait_ordered_roots(fs_info, -1);
 
/* force writing the updated state information to disk */
trans = btrfs_start_transaction(root, 0);
@@ -419,7 +419,7 @@ int btrfs_dev_replace_start(struct btrfs_root *root,
  btrfs_device_get_total_bytes(src_device),
  &dev_replace->scrub_progress, 0, 1);
 
-   ret = btrfs_dev_replace_finishing(root->fs_info, ret);
+   ret = btrfs_dev_replace_finishing(fs_info, ret);
/* don't warn if EINPROGRESS, someone else might be running scrub */
if (ret == -EINPROGRESS) {
args->result = BTRFS_IOCTL_DEV_REPLACE_RESULT_SCRUB_INPROGRESS;
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fstest: btrfs: remove _need_to_be_root fix btrfs/118

2016-03-24 Thread Anand Jain
commit 56ff01f471c9b72de0a447b37cdb1051adcede6a
xfstests: remove _need_to_be_root

Removed _need_to_be_root(), and so btrfs/118 needs an update

Signed-off-by: Anand Jain 
---
 tests/btrfs/118 | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tests/btrfs/118 b/tests/btrfs/118
index 3ed1cbef1068..740cb92b0669 100755
--- a/tests/btrfs/118
+++ b/tests/btrfs/118
@@ -45,7 +45,6 @@ _cleanup()
 . ./common/dmflakey
 
 # real QA test starts here
-_need_to_be_root
 _supported_fs btrfs
 _supported_os Linux
 _require_scratch
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fstest: btrfs: update 048.out inline with ui changes

2016-03-24 Thread Anand Jain
To be inline with progs changes
  28831d54895443e5fc795392f23ce3a8b122cb71
  btrfs-progs: cmd property: switch to common error message wrapper
update 048.out

Signed-off-by: Anand Jain 
---
 tests/btrfs/048.out | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/btrfs/048.out b/tests/btrfs/048.out
index 0b20d0b5976c..1d3ccef8e870 100644
--- a/tests/btrfs/048.out
+++ b/tests/btrfs/048.out
@@ -9,7 +9,7 @@ label=foobar
 ***
 label=
 ***
-ERROR: object is not compatible with property
+ERROR: object is not compatible with property: label
 ***
 
 Testing subvolume ro property
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] fstest: btrfs: test single 4k extent after subpagesize buffered writes

2016-03-24 Thread Eryu Guan
On Wed, Mar 23, 2016 at 09:55:52PM -0700, Liu Bo wrote:
> This is to test if COW enabled btrfs can end up with single 4k extents
> when doing subpagesize buffered writes.
> 
> Signed-off-by: Liu Bo 

Looks good to me.

Reviewed-by: Eryu Guan 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: send: fix handling of multiple snapshots

2016-03-24 Thread Tsutomu Itoh
We cannot send multiple snapshots at once.

[before fix]
# btrfs send ./snap[12] > snap12.data
At subvol ./snap1
At subvol ./snap2
ERROR: parent determination failed for 0
#

[after fix]
# btrfs send ./snap[12] > snap12.data
At subvol ./snap1
At subvol ./snap2
#

Signed-off-by: Tsutomu Itoh 
---
 cmds-send.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/cmds-send.c b/cmds-send.c
index 3e34d75..a220a49 100644
--- a/cmds-send.c
+++ b/cmds-send.c
@@ -683,15 +683,16 @@ int cmd_send(int argc, char **argv)
if (ret < 0)
goto out;
 
-   /* done with this subvol, so add it to the clone sources */
-   ret = add_clone_source(&send, root_id);
-   if (ret < 0) {
-   error("not enough memory");
-   goto out;
+   if (!full_send) {
+   /* done with this subvol, so add it to the clone 
sources */
+   ret = add_clone_source(&send, root_id);
+   if (ret < 0) {
+   error("not enough memory");
+   goto out;
+   }
}
 
parent_root_id = 0;
-   full_send = 0;
}
 
ret = 0;
-- 
2.6.4


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID-1 refuses to balance large drive

2016-03-24 Thread Andrew Vaughan
Hi Brad

Just a user here, not a dev.

I think I might have run into a similar bug about 6 months ago.

At the time I was running Debian stable.  (iirc that is kernel 3.16
and probably btrfs-progs of a similar vintage).

The filesystem was originally a 2 x 6TB array with a 4TB drive added
later when space began to get low.  I'm pretty sure I must have done
at least a partial balance after adding the 4TB drive, but something
like 1TB free on each of the two 6GB drives, and 2GB on the 4TB would
have been 'good enough for me'.

It was nearly full again when a copy unexpectedly reported
out-of-space.  Balance didn't fix it.  In retrospect btrfs had
probably run out of chunks on both 6TB drives.

I'm not sure what actually fixed it.  I upgraded to Debian testing
(something I was going to do soon anyway).  I might have also
temporarily added another drive.   (I have since had a 6TB drive fail,
and btrfs is running happily on 2x4TB, and 1x6TB).

More inline below.

On 24 March 2016 at 05:34, Chris Murphy  wrote:
> On Wed, Mar 23, 2016 at 10:51 AM, Brad Templeton  wrote:
>> Thanks for assist.  To reiterate what I said in private:
>>
>> a) I am fairly sure I swapped drives by adding the 6TB drive and then
>> removing the 2TB drive, which would not have made the 6TB think it was
>> only 2TB.The btrfs statistics commands have shown from the beginning
>> the size of the device as 6TB, and that after the remove, it haad 4TB
>> unallocated.
>
> I agree this seems to be consistent with what's been reported.
>



>>
>> Some options remaining open to me:
>>
>> a) I could re-add the 2TB device, which is still there.  Then balance
>> again, which hopefully would move a lot of stuff.   Then remove it again
>> and hopefully the new stuff would distribute mostly to the large drive.
>>  Then I could try balance again.
>
> Yeah, to do this will require -f to wipe the signature info from that
> drive when you add it. But I don't think this is a case of needing
> more free space, I think it might be due to the odd number of drives
> that are also fairly different in size.
>
If I recall correctly, when I did a device delete, I thought device
delete did remove the btrfs signature.  But I could be wrong

> But then what happens when you delete the 2TB drive after the balance?
> Do you end up right back in this same situation?
>

If balance manages to get the data properly distributed across the
drives, then the 2TB should be mostly empty, and device delete should
be able to remove the 2TB disk.   I successfully added a 4TB disk, did
a balance, and then removed a failing 6TB from the 3 drive array
above.

>
>>
>> b) It was suggested I could (with a good backup) convert the drive to
>> non-RAID1 to free up tons of space and then re-convert.  What's the
>> precise procedure for that?  Perhaps I can do it with a limit to see how
>> it works as an experiment?   Any way to specifically target the blocks
>> that have their two copies on the 2 smaller drives for conversion?
>
> btrfs balance -dconvert=single -mconvert=single -f   ## you have to
> use -f to force reduction in redundancy
> btrfs balance -dconvert=raid1 -mconvert=raid1

I would probably try upgrading to a newer kernel + btrfs-progs first.
Before converting back to raid1, I would also run btrfs device usage
and check to see whether the all devices have approximately the same
amount of unallocated space.  If they don't, maybe try running a full
balance again.



Andrew
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html