Re: Unexpected: send/receive much slower than rsync ?
Hi, I am not an expert just a btrfs user who uses send/receive quite frequently but I am pretty sure your problem is not on the receive side but on the send end. Can you check with e.g. iotop if receive is writing anything to the disk or if it's just waiting for send? How much is send reading from disk and how much memory is it allocating? I am asking this because I reckon your problem is caused by the way clone detection is done in send. There is a proposed patch https://patchwork.kernel.org/patch/9245287/ that addresses the problem. This did indeed help me when I had a similar problem when trying to send a previously deduplicated filesystem! Greetings Hermann On 04/11/2017 05:11 PM, J. Hart wrote: > I'm trying to update from an old snapshot of a directory to a new one > using send/receive. It seems a great deal slower than I was expecting, > perhaps much slower than rsync and has been running for hours. > Everything looks ok with how I set up the snapshots, and there are no > error messages, but I don't think it should be running this long. The > directory structure is rather complex, so that may have something to do > with it. It contains reflinked incremental backups of root file systems > from a number of machines. It should not actually be very large due to > the reflinks. > > Sending the old version of the snapshot for the directory did not seem > to take this long, and I expected the "send -p " to be much > faster than that. > > I tried running the "send" and "receive" with "-vv" to get more detail > on what was happening. > > I had thought that btrfs send/receive purely dealt with block/extent > level changes. > > I could be mistaken, but it seems that btrfs receive actually does a > great deal of manipulation at the level of individual files, and rather > less efficiently than rsync at that. I am not sure whether it is using > system calls to do this, or actual shell commands themselves. I see > quite a bit of what looks like file level manipulation in the verbose > output. It is indeed very fast for simple directory trees even with very > large files. However, it seems to be far slower than rsync with > moderately complex directory trees, even if no large files are present. > > I hope I'm overlooking something, and that this is not actually the > case. Any ideas on this ? > > J. Hart > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS as a GlusterFS storage back-end, and what I've learned from using it as such.
Not to change the topic too much but, is there a suite of tracing scripts that one can attach to their BtrFS installation to gather metrics about tree locking performance? We see an awful lot of machines with a task waiting on btrfs_tree_lock, and a bunch of other tasks that are also in disk sleep waiting on BtrFS. We also see a bunch of hung timeouts around btrfs_destroy_inode -- We're running Kernel 4.8, so we can pretty easily plug in BPF based probes into the kernel to get this information, and aggregate it. Rather than doing this work ourselves, I'm wondering if anyone else has a good set of tools to collect perf data about BtrFS performance and lock contention? On Tue, Apr 11, 2017 at 10:49 PM, Qu Wenruo wrote: > > > At 04/11/2017 11:40 PM, Austin S. Hemmelgarn wrote: >> >> About a year ago now, I decided to set up a small storage cluster to store >> backups (and partially replace Dropbox for my usage, but that's a separate >> story). I ended up using GlusterFS as the clustering software itself, and >> BTRFS as the back-end storage. >> >> GlusterFS itself is actually a pretty easy workload as far as cluster >> software goes. It does some processing prior to actually storing the data >> (a significant amount in fact), but the actual on-device storage on any >> given node is pretty simple. You have the full directory structure for the >> whole volume, and whatever files happen to be on that node are located >> within that tree exactly like they are in the GlusterFS volume. Beyond the >> basic data, gluster only stores 2-4 xattrs per-file (which are used to track >> synchronization, and also for it's internal data scrubbing), and a directory >> called .glusterfs in the top of the back-end storage location for the volume >> which contains the data required to figure out which node a file is on. >> Overall, the access patterns mostly mirror whatever is using the Gluster >> volume, or are reduced to slow streaming writes (when writing files and the >> back-end nodes are computationally limited instead of I/O limited), with the >> addition of some serious metadata operations in the .glusterfs directory >> (lots of stat calls there, together with large numbers of small files). > > > Any real world experience is welcomed to share. > >> >> As far as overall performance, BTRFS is actually on par for this usage >> with both ext4 and XFS (at least, on my hardware it is), and I actually see >> more SSD friendly access patterns when using BTRFS in this case than any >> other FS I tried. > > > We also find that, for pure buffered read/write, btrfs is no worse than > traditional fs. > > In our PostgreSQL test, btrfs can even get a little better performance than > ext4/xfs when handling DB files. > > But if using btrfs for PostgreSQL Write Ahead Log (WAL), then it's > completely another thing. > Btrfs falls far behind ext4/xfs on HDD, only half of the TPC performance for > low concurrency load. > > Due to btrfs CoW, btrfs causes extra IO for fsync. > For example, if only to fsync 4K data, btrfs can cause 64K metadata write > for default mkfs options. > (One tree block for log root tree, one tree block for log tree, multiple by > 2 for default DUP profile) > >> >> After some serious experimentation with various configurations for this >> during the past few months, I've noticed a handful of other things: >> >> 1. The 'ssd' mount option does not actually improve performance on these >> SSD's. To a certain extent, this actually surprised me at first, but having >> seen Hans' e-mail and what he found about this option, it actually makes >> sense, since erase-blocks on these devices are 4MB, not 2MB, and the drives >> have a very good FTL (so they will aggregate all the little writes >> properly). >> >> Given this, I'm beginning to wonder if it actually makes sense to not >> automatically enable this on mount when dealing with certain types of >> storage (for example, most SATA and SAS SSD's have reasonably good FTL's, so >> I would expect them to have similar behavior). Extrapolating further, it >> might instead make sense to just never automatically enable this, and expose >> the value this option is manipulating as a mount option as there are other >> circumstances where setting specific values could improve performance (for >> example, if you're on hardware RAID6, setting this to the stripe size would >> probably improve performance on many cheaper controllers). >> >> 2. Up to a certain point, running a single larger BTRFS volume with >> multiple sub-volumes is more computationally efficient than running multiple >> smaller BTRFS volumes. More specifically, there is lower load on the system >> and lower CPU utilization by BTRFS itself without much noticeable difference >> in performance (in my tests it was about 0.5-1% performance difference, >> YMMV). To a certain extent this makes some sense, but the turnover point >> was actually a lot higher than I expected (with this workload, the turnover >> point was a
Re: [PATCH 04/25] fs: Provide infrastructure for dynamic BDIs in filesystems
> + if (sb->s_iflags & SB_I_DYNBDI) { > + bdi_put(sb->s_bdi); > + sb->s_bdi = &noop_backing_dev_info; At some point I'd really like to get rid of noop_backing_dev_info and have a NULL here.. Otherwise this looks fine.. Reviewed-by: Christoph Hellwig -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/25] btrfs: Convert to separately allocated bdi
Looks fine, Reviewed-by: Christoph Hellwig -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] Use RWF_* flags for AIO operations
> > + if (unlikely(iocb->aio_rw_flags & ~(RWF_HIPRI | RWF_DSYNC | RWF_SYNC))) > { > + pr_debug("EINVAL: aio_rw_flags set with incompatible flags\n"); > + return -EINVAL; > + } > + if (iocb->aio_rw_flags & RWF_HIPRI) > + req->common.ki_flags |= IOCB_HIPRI; > + if (iocb->aio_rw_flags & RWF_DSYNC) > + req->common.ki_flags |= IOCB_DSYNC; > + if (iocb->aio_rw_flags & RWF_SYNC) > + req->common.ki_flags |= (IOCB_DSYNC | IOCB_SYNC); Pleae introduce a common helper to share this code between the synchronous and the aio path -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/9] nowait aio: return on congested block device
As mentioned last time around, this should be a REQ_NOWAIT flag so that it can be easily passed down to the request layer. > +static inline void bio_wouldblock_error(struct bio *bio) > +{ > + bio->bi_error = -EAGAIN; > + bio_endio(bio); > +} Please skip this helper.. > +#define QUEUE_FLAG_NOWAIT 28/* queue supports BIO_NOWAIT */ Please make the flag name a little more descriptive, this sounds like it will never wait. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 9/9] nowait aio: Return -EOPNOTSUPP if filesystem does not support
This should go into the patch that introduces IOCB_NOWAIT. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fstests: regression test for btrfs dio read repair
On Wed, Apr 12, 2017 at 2:27 AM, Liu Bo wrote: > This case tests whether dio read can repair the bad copy if we have > a good copy. Regardless of being a test we should have always had (thanks for this!), it would be useful to mention we had a regression (as the test description in the btrfs/140 file says) and which patch fixed it (and possibly which kernel version or patch/commit introduced the regression). Just a comment/question below. > > Signed-off-by: Liu Bo > --- > tests/btrfs/140 | 152 > > tests/btrfs/140.out | 39 ++ > tests/btrfs/group | 1 + > 3 files changed, 192 insertions(+) > create mode 100755 tests/btrfs/140 > create mode 100644 tests/btrfs/140.out > > diff --git a/tests/btrfs/140 b/tests/btrfs/140 > new file mode 100755 > index 000..db56123 > --- /dev/null > +++ b/tests/btrfs/140 > @@ -0,0 +1,152 @@ > +#! /bin/bash > +# FS QA Test 140 > +# > +# Regression test for btrfs DIO read's repair during read. > +# > +#--- > +# Copyright (c) 2017 Liu Bo. All Rights Reserved. > +# > +# This program is free software; you can redistribute it and/or > +# modify it under the terms of the GNU General Public License as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it would be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write the Free Software Foundation, > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > +#--- > +# > + > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > + > +here=`pwd` > +tmp=/tmp/$$ > +status=1 # failure is the default! > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > + cd / > + rm -f $tmp.* > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > + > +# remove previous $seqres.full before test > +rm -f $seqres.full > + > +# real QA test starts here > + > +# Modify as appropriate. > +_supported_fs btrfs > +_supported_os Linux > +_require_scratch_dev_pool 2 > +_require_command "$BTRFS_MAP_LOGICAL_PROG" btrfs-map-logical > +_require_command "$FILEFRAG_PROG" filefrag > +_require_odirect > + > +# helpe to convert 'file offset' to btrfs logical offset > +FILEFRAG_FILTER=' > + if (/blocks? of (\d+) bytes/) { > + $blocksize = $1; > + next > + } > + ($ext, $logical, $physical, $length) = > + (/^\s*(\d+):\s+(\d+)..\s+\d+:\s+(\d+)..\s+\d+:\s+(\d+):/) > + or next; > + ($flags) = /.*:\s*(\S*)$/; > + print $physical * $blocksize, "#", > + $length * $blocksize, "#", > + $logical * $blocksize, "#", > + $flags, " "' > + > +# this makes filefrag output script readable by using a perl helper. > +# output is one extent per line, with three numbers separated by '#' > +# the numbers are: physical, length, logical (all in bytes) > +# sample output: "1234#10#5678" -> physical 1234, length 10, logical 5678 > +_filter_extents() > +{ > + tee -a $seqres.full | $PERL_PROG -ne "$FILEFRAG_FILTER" > +} > + > +_check_file_extents() > +{ > + cmd="filefrag -v $1" > + echo "# $cmd" >> $seqres.full > + out=`$cmd | _filter_extents` > + if [ -z "$out" ]; then > + return 1 > + fi > + echo "after filter: $out" >> $seqres.full > + echo $out > + return 0 > +} > + > +_check_repair() > +{ > + filter=${1:-cat} > + dmesg | tac | sed -ne "0,\#run fstests $seqnum at $date_time#p" | tac > | $filter | grep -q -e "csum failed" > + if [ $? -eq 0 ]; then > + echo 1 > + else > + echo 0 > + fi > +} > + > +_scratch_dev_pool_get 2 > +# step 1, create a raid1 btrfs which contains one 128k file. > +echo "step 1..mkfs.btrfs" >>$seqres.full > + > +mkfs_opts="-d raid1" > +_scratch_pool_mkfs $mkfs_opts >>$seqres.full 2>&1 > + > +_scratch_mount -o nospace_cache Why do we need to mount without space cache? I don't see why nor I think it's obvious. A comment in the test mentioning why would be useful for everyone. > + > +$XFS_IO_PROG -f -d -c "pwrite -S 0xaa -b 128K 0 128K" "$SCRATCH_MNT/foobar" > | _filter_xfs_io > + > +sync > + > +# step 2, corrupt the first 64k of one copy (on SCRATCH_DEV which is the > first > +# one in $SCRATCH_DEV_POOL > +echo "step 2..corrupt file extent" >>$seqres.full > + > +extents=`_check_file_extents $SCRATCH_MNT/foobar` > +logical_in_btrfs=`echo ${extents} | cut -d '#' -f 1` > +physical_on_scratch=`$BTRFS_M
Re: [PATCH] fstests: regression test for btrfs buffered read's repair
On Wed, Apr 12, 2017 at 2:27 AM, Liu Bo wrote: > This case tests whether buffered read can repair the bad copy if we > have a good copy. Regardless of being a test we should have always had (thanks for this!), it would be useful to mention we had a regression (as the test description in the btrfs/141 file says) and which patch fixed it (and possibly which kernel version or patch/commit introduced the regression). Just a couple comments/questions below. > > Signed-off-by: Liu Bo > --- > tests/btrfs/141 | 152 > > tests/btrfs/141.out | 39 ++ > tests/btrfs/group | 1 + > 3 files changed, 192 insertions(+) > create mode 100755 tests/btrfs/141 > create mode 100644 tests/btrfs/141.out > > diff --git a/tests/btrfs/141 b/tests/btrfs/141 > new file mode 100755 > index 000..53fd75c > --- /dev/null > +++ b/tests/btrfs/141 > @@ -0,0 +1,152 @@ > +#! /bin/bash > +# FS QA Test 141 > +# > +# Regression test for btrfs buffered read's repair during read. > +# > +#--- > +# Copyright (c) 2017 Liu Bo. All Rights Reserved. > +# > +# This program is free software; you can redistribute it and/or > +# modify it under the terms of the GNU General Public License as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it would be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write the Free Software Foundation, > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > +#--- > +# > + > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > + > +here=`pwd` > +tmp=/tmp/$$ > +status=1 # failure is the default! > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > + cd / > + rm -f $tmp.* > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > + > +# remove previous $seqres.full before test > +rm -f $seqres.full > + > +# real QA test starts here > + > +# Modify as appropriate. > +_supported_fs btrfs > +_supported_os Linux > +_require_scratch_dev_pool 2 > +_require_command "$BTRFS_MAP_LOGICAL_PROG" btrfs-map-logical > +_require_command "$FILEFRAG_PROG" filefrag > + > +# helpe to convert 'file offset' to btrfs logical offset > +FILEFRAG_FILTER=' > + if (/blocks? of (\d+) bytes/) { > + $blocksize = $1; > + next > + } > + ($ext, $logical, $physical, $length) = > + (/^\s*(\d+):\s+(\d+)..\s+\d+:\s+(\d+)..\s+\d+:\s+(\d+):/) > + or next; > + ($flags) = /.*:\s*(\S*)$/; > + print $physical * $blocksize, "#", > + $length * $blocksize, "#", > + $logical * $blocksize, "#", > + $flags, " "' > + > +# this makes filefrag output script readable by using a perl helper. > +# output is one extent per line, with three numbers separated by '#' > +# the numbers are: physical, length, logical (all in bytes) > +# sample output: "1234#10#5678" -> physical 1234, length 10, logical 5678 > +_filter_extents() > +{ > + tee -a $seqres.full | $PERL_PROG -ne "$FILEFRAG_FILTER" > +} > + > +_check_file_extents() > +{ > + cmd="filefrag -v $1" > + echo "# $cmd" >> $seqres.full > + out=`$cmd | _filter_extents` > + if [ -z "$out" ]; then > + return 1 > + fi > + echo "after filter: $out" >> $seqres.full > + echo $out > + return 0 > +} > + > +_check_repair() > +{ > + filter=${1:-cat} > + dmesg | tac | sed -ne "0,\#run fstests $seqnum at $date_time#p" | tac > | $filter | grep -q -e "csum failed" > + if [ $? -eq 0 ]; then > + echo 1 > + else > + echo 0 > + fi > +} > + > +_scratch_dev_pool_get 2 > +# step 1, create a raid1 btrfs which contains one 128k file. > +echo "step 1..mkfs.btrfs" >>$seqres.full > + > +mkfs_opts="-d raid1" > +_scratch_pool_mkfs $mkfs_opts >>$seqres.full 2>&1 > + > +_scratch_mount -o nospace_cache Same as the other test, why do we need to mount without space cache? It isn't obvious if it's needed nor why - a comment in the test explaining why would be useful for everyone. > + > +$XFS_IO_PROG -f -d -c "pwrite -S 0xaa -b 128K 0 128K" "$SCRATCH_MNT/foobar" > | _filter_xfs_io > + > +sync > + > +# step 2, corrupt the first 64k of one copy (on SCRATCH_DEV which is the > first > +# one in $SCRATCH_DEV_POOL > +echo "step 2..corrupt file extent" >>$seqres.full > + > +extents=`_check_file_extents $SCRATCH_MNT/foobar` > +logical_in_btrfs=`echo ${extents} | cut -d '#' -f 1` > +physica
[PATCH 0/25 v3] fs: Convert all embedded bdis into separate ones
Hello, this is the third revision of the patch series which converts all embedded occurences of struct backing_dev_info to use standalone dynamically allocated structures. This makes bdi handling unified across all bdi users and generally removes some boilerplate code from filesystems setting up their own bdi. It also allows us to remove some code from generic bdi implementation. The patches were only compile-tested for most filesystems (I've tested mounting only for NFS & btrfs) so fs maintainers please have a look whether the changes look sound to you. This series is based on top of bdi fixes that were merged into linux-block git tree into for-next branch. I have pushed out the result as a branch to git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git bdi Since all patches got reviewed by Christoph, can you please pick them up Jens? Thanks! Changes since v2: * Added Reviewed-by tags from Christoph Changes since v1: * Added some acks * Added further FUSE cleanup patch * Added removal of unused argument to bdi_register() * Fixed up some compilation failures spotted by 0-day testing Honza -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/25] btrfs: Convert to separately allocated bdi
Allocate struct backing_dev_info separately instead of embedding it inside superblock. This unifies handling of bdi among users. CC: Chris Mason CC: Josef Bacik CC: David Sterba CC: linux-btrfs@vger.kernel.org Reviewed-by: Liu Bo Reviewed-by: David Sterba Reviewed-by: Christoph Hellwig Signed-off-by: Jan Kara --- fs/btrfs/ctree.h | 1 - fs/btrfs/disk-io.c | 36 +++- fs/btrfs/super.c | 7 +++ 3 files changed, 14 insertions(+), 30 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 29b7fc28c607..f6019ce20035 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -810,7 +810,6 @@ struct btrfs_fs_info { struct btrfs_super_block *super_for_commit; struct super_block *sb; struct inode *btree_inode; - struct backing_dev_info bdi; struct mutex tree_log_mutex; struct mutex transaction_kthread_mutex; struct mutex cleaner_mutex; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 08b74daf35d0..a7d8c342f604 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1808,21 +1808,6 @@ static int btrfs_congested_fn(void *congested_data, int bdi_bits) return ret; } -static int setup_bdi(struct btrfs_fs_info *info, struct backing_dev_info *bdi) -{ - int err; - - err = bdi_setup_and_register(bdi, "btrfs"); - if (err) - return err; - - bdi->ra_pages = VM_MAX_READAHEAD * 1024 / PAGE_SIZE; - bdi->congested_fn = btrfs_congested_fn; - bdi->congested_data = info; - bdi->capabilities |= BDI_CAP_CGROUP_WRITEBACK; - return 0; -} - /* * called by the kthread helper functions to finally call the bio end_io * functions. This is where read checksum verification actually happens @@ -2601,16 +2586,10 @@ int open_ctree(struct super_block *sb, goto fail; } - ret = setup_bdi(fs_info, &fs_info->bdi); - if (ret) { - err = ret; - goto fail_srcu; - } - ret = percpu_counter_init(&fs_info->dirty_metadata_bytes, 0, GFP_KERNEL); if (ret) { err = ret; - goto fail_bdi; + goto fail_srcu; } fs_info->dirty_metadata_batch = PAGE_SIZE * (1 + ilog2(nr_cpu_ids)); @@ -2718,7 +2697,6 @@ int open_ctree(struct super_block *sb, sb->s_blocksize = 4096; sb->s_blocksize_bits = blksize_bits(4096); - sb->s_bdi = &fs_info->bdi; btrfs_init_btree_inode(fs_info); @@ -2915,9 +2893,12 @@ int open_ctree(struct super_block *sb, goto fail_sb_buffer; } - fs_info->bdi.ra_pages *= btrfs_super_num_devices(disk_super); - fs_info->bdi.ra_pages = max(fs_info->bdi.ra_pages, - SZ_4M / PAGE_SIZE); + sb->s_bdi->congested_fn = btrfs_congested_fn; + sb->s_bdi->congested_data = fs_info; + sb->s_bdi->capabilities |= BDI_CAP_CGROUP_WRITEBACK; + sb->s_bdi->ra_pages = VM_MAX_READAHEAD * 1024 / PAGE_SIZE; + sb->s_bdi->ra_pages *= btrfs_super_num_devices(disk_super); + sb->s_bdi->ra_pages = max(sb->s_bdi->ra_pages, SZ_4M / PAGE_SIZE); sb->s_blocksize = sectorsize; sb->s_blocksize_bits = blksize_bits(sectorsize); @@ -3285,8 +3266,6 @@ int open_ctree(struct super_block *sb, percpu_counter_destroy(&fs_info->delalloc_bytes); fail_dirty_metadata_bytes: percpu_counter_destroy(&fs_info->dirty_metadata_bytes); -fail_bdi: - bdi_destroy(&fs_info->bdi); fail_srcu: cleanup_srcu_struct(&fs_info->subvol_srcu); fail: @@ -4007,7 +3986,6 @@ void close_ctree(struct btrfs_fs_info *fs_info) percpu_counter_destroy(&fs_info->dirty_metadata_bytes); percpu_counter_destroy(&fs_info->delalloc_bytes); percpu_counter_destroy(&fs_info->bio_counter); - bdi_destroy(&fs_info->bdi); cleanup_srcu_struct(&fs_info->subvol_srcu); btrfs_free_stripe_hash_table(fs_info); diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index da687dc79cce..e0a7503ab31e 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1133,6 +1133,13 @@ static int btrfs_fill_super(struct super_block *sb, #endif sb->s_flags |= MS_I_VERSION; sb->s_iflags |= SB_I_CGROUPWB; + + err = super_setup_bdi(sb); + if (err) { + btrfs_err(fs_info, "super_setup_bdi failed"); + return err; + } + err = open_ctree(sb, fs_devices, (char *)data); if (err) { btrfs_err(fs_info, "open_ctree failed"); -- 2.12.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/25] fs: Provide infrastructure for dynamic BDIs in filesystems
Provide helper functions for setting up dynamically allocated backing_dev_info structures for filesystems and cleaning them up on superblock destruction. CC: linux-...@lists.infradead.org CC: linux-...@vger.kernel.org CC: Petr Vandrovec CC: linux-ni...@vger.kernel.org CC: cluster-de...@redhat.com CC: osd-...@open-osd.org CC: codal...@coda.cs.cmu.edu CC: linux-...@lists.infradead.org CC: ecryp...@vger.kernel.org CC: linux-c...@vger.kernel.org CC: ceph-de...@vger.kernel.org CC: linux-btrfs@vger.kernel.org CC: v9fs-develo...@lists.sourceforge.net CC: lustre-de...@lists.lustre.org Reviewed-by: Christoph Hellwig Signed-off-by: Jan Kara --- fs/super.c | 49 include/linux/backing-dev-defs.h | 2 +- include/linux/fs.h | 6 + 3 files changed, 56 insertions(+), 1 deletion(-) diff --git a/fs/super.c b/fs/super.c index b8b6a086c03b..0f51a437c269 100644 --- a/fs/super.c +++ b/fs/super.c @@ -446,6 +446,11 @@ void generic_shutdown_super(struct super_block *sb) hlist_del_init(&sb->s_instances); spin_unlock(&sb_lock); up_write(&sb->s_umount); + if (sb->s_iflags & SB_I_DYNBDI) { + bdi_put(sb->s_bdi); + sb->s_bdi = &noop_backing_dev_info; + sb->s_iflags &= ~SB_I_DYNBDI; + } } EXPORT_SYMBOL(generic_shutdown_super); @@ -1256,6 +1261,50 @@ mount_fs(struct file_system_type *type, int flags, const char *name, void *data) } /* + * Setup private BDI for given superblock. It gets automatically cleaned up + * in generic_shutdown_super(). + */ +int super_setup_bdi_name(struct super_block *sb, char *fmt, ...) +{ + struct backing_dev_info *bdi; + int err; + va_list args; + + bdi = bdi_alloc(GFP_KERNEL); + if (!bdi) + return -ENOMEM; + + bdi->name = sb->s_type->name; + + va_start(args, fmt); + err = bdi_register_va(bdi, NULL, fmt, args); + va_end(args); + if (err) { + bdi_put(bdi); + return err; + } + WARN_ON(sb->s_bdi != &noop_backing_dev_info); + sb->s_bdi = bdi; + sb->s_iflags |= SB_I_DYNBDI; + + return 0; +} +EXPORT_SYMBOL(super_setup_bdi_name); + +/* + * Setup private BDI for given superblock. I gets automatically cleaned up + * in generic_shutdown_super(). + */ +int super_setup_bdi(struct super_block *sb) +{ + static atomic_long_t bdi_seq = ATOMIC_LONG_INIT(0); + + return super_setup_bdi_name(sb, "%.28s-%ld", sb->s_type->name, + atomic_long_inc_return(&bdi_seq)); +} +EXPORT_SYMBOL(super_setup_bdi); + +/* * This is an internal function, please use sb_end_{write,pagefault,intwrite} * instead. */ diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h index e66d4722db8e..866c433e7d32 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -146,7 +146,7 @@ struct backing_dev_info { congested_fn *congested_fn; /* Function pointer if device is md/dm */ void *congested_data; /* Pointer to aux data for congested func */ - char *name; + const char *name; struct kref refcnt; /* Reference counter for the structure */ unsigned int capabilities; /* Device capabilities */ diff --git a/include/linux/fs.h b/include/linux/fs.h index 7251f7bb45e8..98cf14ea78c0 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1272,6 +1272,9 @@ struct mm_struct; /* sb->s_iflags to limit user namespace mounts */ #define SB_I_USERNS_VISIBLE0x0010 /* fstype already mounted */ +/* Temporary flag until all filesystems are converted to dynamic bdis */ +#define SB_I_DYNBDI0x0100 + /* Possible states of 'frozen' field */ enum { SB_UNFROZEN = 0,/* FS is unfrozen */ @@ -2121,6 +2124,9 @@ extern int vfs_ustat(dev_t, struct kstatfs *); extern int freeze_super(struct super_block *super); extern int thaw_super(struct super_block *super); extern bool our_mnt(struct vfsmount *mnt); +extern __printf(2, 3) +int super_setup_bdi_name(struct super_block *sb, char *fmt, ...); +extern int super_setup_bdi(struct super_block *sb); extern int current_umask(void); -- 2.12.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS as a GlusterFS storage back-end, and what I've learned from using it as such.
On 2017-04-12 01:49, Qu Wenruo wrote: At 04/11/2017 11:40 PM, Austin S. Hemmelgarn wrote: About a year ago now, I decided to set up a small storage cluster to store backups (and partially replace Dropbox for my usage, but that's a separate story). I ended up using GlusterFS as the clustering software itself, and BTRFS as the back-end storage. GlusterFS itself is actually a pretty easy workload as far as cluster software goes. It does some processing prior to actually storing the data (a significant amount in fact), but the actual on-device storage on any given node is pretty simple. You have the full directory structure for the whole volume, and whatever files happen to be on that node are located within that tree exactly like they are in the GlusterFS volume. Beyond the basic data, gluster only stores 2-4 xattrs per-file (which are used to track synchronization, and also for it's internal data scrubbing), and a directory called .glusterfs in the top of the back-end storage location for the volume which contains the data required to figure out which node a file is on. Overall, the access patterns mostly mirror whatever is using the Gluster volume, or are reduced to slow streaming writes (when writing files and the back-end nodes are computationally limited instead of I/O limited), with the addition of some serious metadata operations in the .glusterfs directory (lots of stat calls there, together with large numbers of small files). Any real world experience is welcomed to share. As far as overall performance, BTRFS is actually on par for this usage with both ext4 and XFS (at least, on my hardware it is), and I actually see more SSD friendly access patterns when using BTRFS in this case than any other FS I tried. We also find that, for pure buffered read/write, btrfs is no worse than traditional fs. In our PostgreSQL test, btrfs can even get a little better performance than ext4/xfs when handling DB files. But if using btrfs for PostgreSQL Write Ahead Log (WAL), then it's completely another thing. Btrfs falls far behind ext4/xfs on HDD, only half of the TPC performance for low concurrency load. Due to btrfs CoW, btrfs causes extra IO for fsync. For example, if only to fsync 4K data, btrfs can cause 64K metadata write for default mkfs options. (One tree block for log root tree, one tree block for log tree, multiple by 2 for default DUP profile) After some serious experimentation with various configurations for this during the past few months, I've noticed a handful of other things: 1. The 'ssd' mount option does not actually improve performance on these SSD's. To a certain extent, this actually surprised me at first, but having seen Hans' e-mail and what he found about this option, it actually makes sense, since erase-blocks on these devices are 4MB, not 2MB, and the drives have a very good FTL (so they will aggregate all the little writes properly). Given this, I'm beginning to wonder if it actually makes sense to not automatically enable this on mount when dealing with certain types of storage (for example, most SATA and SAS SSD's have reasonably good FTL's, so I would expect them to have similar behavior). Extrapolating further, it might instead make sense to just never automatically enable this, and expose the value this option is manipulating as a mount option as there are other circumstances where setting specific values could improve performance (for example, if you're on hardware RAID6, setting this to the stripe size would probably improve performance on many cheaper controllers). 2. Up to a certain point, running a single larger BTRFS volume with multiple sub-volumes is more computationally efficient than running multiple smaller BTRFS volumes. More specifically, there is lower load on the system and lower CPU utilization by BTRFS itself without much noticeable difference in performance (in my tests it was about 0.5-1% performance difference, YMMV). To a certain extent this makes some sense, but the turnover point was actually a lot higher than I expected (with this workload, the turnover point was around half a terabyte). This seems to be related to tree locking overhead. My thought too, although I find it interesting that the benefit starts to disappear as the FS gets bigger beyond a certain point (on my system it was about half a terabyte, but I would expect it to be different on systems with different numbers of CPU cores (differing levels of lock contention) or different workloads (probably inversely proportionate to the amount of metadata work the workload produces). The most obvious solution is just as you stated, use many small subvolumes other than one large subvolume. Another less obvious solution is to reduce tree block size at mkfs time. This Btrfs is not that good at handling metadata workload, limited by both the overhead of mandatory metadata CoW and current tree lock algorithm. I believe this to be a side-effect of how we use per-filesystem
Re: Btrfs disk layout question
On 2017-04-12 00:18, Chris Murphy wrote: On Tue, Apr 11, 2017 at 3:00 PM, Adam Borowski wrote: On Tue, Apr 11, 2017 at 12:15:32PM -0700, Amin Hassani wrote: I am working on a project with Btrfs and I was wondering if there is any way to see the disk layout of the btrfs image. Let's assume I have a read-only btrfs image with compression on and only using one disk (no raid or anything). Is it possible to get a set of offset-lengths for each file While btrfs-specific ioctls give more information, you might want to look at FIEMAP (Documentation/filesystems/fiemap.txt) as it works on most filesystems, not just btrfs. One interface to FIEMAP is provided in "/usr/sbin/filefrag -v". Good idea. Although, on Btrfs I'm pretty sure it reports the Btrfs (internal) logical addressing; not the actual physical sector address on the drive. So it depends on what the original poster is trying to discover. That said, there is a tool to translate that back, and depending on how detailed you want to get, that may be more efficient than debug tree. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fstests: introduce btrfs-map-logical
On Wed, Apr 12, 2017 at 09:35:00AM +0800, Qu Wenruo wrote: > > > At 04/12/2017 09:27 AM, Liu Bo wrote: > > A typical use case of 'btrfs-map-logical' is to translate btrfs logical > > address to physical address on each disk. > > Could we avoid usage of btrfs-map-logical here? Agreed. > I understand that we need to do corruption so that we can test if the > repair works, but I'm not sure if the output format will change, or if > the program will get replace by "btrfs inspect-internal" group. In the long-term it will be repleaced, but there's no ETA. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fstests: remove snapshot aware defrag test
On Tue, Apr 11, 2017 at 06:27:18PM -0700, Liu Bo wrote: > Since snapshot aware defrag has been disabled in kernel, and we all have > learned to ignore the failure of btrfs/010, lets just remove it. > > Signed-off-by: Liu Bo Reviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fstests: introduce btrfs-map-logical
On Wed, Apr 12, 2017 at 02:32:02PM +0200, David Sterba wrote: > On Wed, Apr 12, 2017 at 09:35:00AM +0800, Qu Wenruo wrote: > > > > > > At 04/12/2017 09:27 AM, Liu Bo wrote: > > > A typical use case of 'btrfs-map-logical' is to translate btrfs logical > > > address to physical address on each disk. > > > > Could we avoid usage of btrfs-map-logical here? > > Agreed. > > > I understand that we need to do corruption so that we can test if the > > repair works, but I'm not sure if the output format will change, or if > > the program will get replace by "btrfs inspect-internal" group. > > In the long-term it will be repleaced, but there's no ETA. Possibly, if fstests maintainer agrees, we can add btrfs-map-logical to fstests. It's small and uses headers from libbtrfs, so this would become a new dependency but I believe is still bearable. I'm not sure if we should export all debuging functionality in 'btrfs' as this is typically something that a user will never want, not even in the emergency environments. There's an overlap in the information to be exported but I'd be more inclined to satisfy user needs than testsuite needs. So an independent tool would give us more freedom on both sides. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: remove some dead code
On Tue, Apr 11, 2017 at 11:57:15AM +0300, Dan Carpenter wrote: > btrfs_get_extent() never returns NULL pointers, so this code introduces > a static checker warning. > > The btrfs_get_extent() is a bit complex, but trust me that it doesn't > return NULLs and also if it did we would trigger the BUG_ON(!em) before > the last return statement. > > Signed-off-by: Dan Carpenter Added to 4.12, thanks. I've updated the subject line so it reflects what the patch does. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/12] fs: btrfs: Use ktime_get_real_ts for root ctime
On Fri, Apr 07, 2017 at 05:57:05PM -0700, Deepa Dinamani wrote: > btrfs_root_item maintains the ctime for root updates. > This is not part of vfs_inode. > > Since current_time() uses struct inode* as an argument > as Linus suggested, this cannot be used to update root > times unless, we modify the signature to use inode. > > Since btrfs uses nanosecond time granularity, it can also > use ktime_get_real_ts directly to obtain timestamp for > the root. It is necessary to use the timespec time api > here because the same btrfs_set_stack_timespec_*() apis > are used for vfs inode times as well. These can be > transitioned to using timespec64 when btrfs internally > changes to use timespec64 as well. > > Signed-off-by: Deepa Dinamani > Acked-by: David Sterba > Reviewed-by: Arnd Bergmann I'm going to add the patch to my 4.12 queue and will let Andrew know. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] btrfs: fiemap: Cache and merge fiemap extent before submit it to user
On Fri, Apr 07, 2017 at 10:43:15AM +0800, Qu Wenruo wrote: > [BUG] > Cycle mount btrfs can cause fiemap to return different result. > Like: > # mount /dev/vdb5 /mnt/btrfs > # dd if=/dev/zero bs=16K count=4 oflag=dsync of=/mnt/btrfs/file > # xfs_io -c "fiemap -v" /mnt/btrfs/file > /mnt/test/file: > EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS >0: [0..127]:25088..25215 128 0x1 > # umount /mnt/btrfs > # mount /dev/vdb5 /mnt/btrfs > # xfs_io -c "fiemap -v" /mnt/btrfs/file > /mnt/test/file: > EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS >0: [0..31]: 25088..2511932 0x0 >1: [32..63]:25120..2515132 0x0 >2: [64..95]:25152..2518332 0x0 >3: [96..127]: 25184..2521532 0x1 > But after above fiemap, we get correct merged result if we call fiemap > again. > # xfs_io -c "fiemap -v" /mnt/btrfs/file > /mnt/test/file: > EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS >0: [0..127]:25088..25215 128 0x1 > > [REASON] > Btrfs will try to merge extent map when inserting new extent map. > > btrfs_fiemap(start=0 len=(u64)-1) > |- extent_fiemap(start=0 len=(u64)-1) >|- get_extent_skip_holes(start=0 len=64k) >| |- btrfs_get_extent_fiemap(start=0 len=64k) >| |- btrfs_get_extent(start=0 len=64k) >|| Found on-disk (ino, EXTENT_DATA, 0) >||- add_extent_mapping() >||- Return (em->start=0, len=16k) >| >|- fiemap_fill_next_extent(logic=0 phys=X len=16k) >| >|- get_extent_skip_holes(start=0 len=64k) >| |- btrfs_get_extent_fiemap(start=0 len=64k) >| |- btrfs_get_extent(start=16k len=48k) >|| Found on-disk (ino, EXTENT_DATA, 16k) >||- add_extent_mapping() >|| |- try_merge_map() >|| Merge with previous em start=0 len=16k >|| resulting em start=0 len=32k >||- Return (em->start=0, len=32K)<< Merged result >|- Stripe off the unrelated range (0~16K) of return em >|- fiemap_fill_next_extent(logic=16K phys=X+16K len=16K) > ^^^ Causing split fiemap extent. > > And since in add_extent_mapping(), em is already merged, in next > fiemap() call, we will get merged result. > > [FIX] > Here we introduce a new structure, fiemap_cache, which records previous > fiemap extent. > > And will always try to merge current fiemap_cache result before calling > fiemap_fill_next_extent(). > Only when we failed to merge current fiemap extent with cached one, we > will call fiemap_fill_next_extent() to submit cached one. > > So by this method, we can merge all fiemap extents. The cache gets reset on each call to extent_fiemap, so if fi_extents_max is 1, the cache will be always unset and we'll never merge anything. The same can happen if the number of extents reaches the limit (FIEMAP_MAX_EXTENTS or any other depending on the ioctl caller). And this leads to the unmerged extents. > It can also be done in fs/ioctl.c, however the problem is if > fieinfo->fi_extents_max == 0, we have no space to cache previous fiemap > extent. I don't see why, it's the same code path, no? > So I choose to merge it in btrfs. Lifting that to the vfs interface is probably not the right approach. The ioctl has never done any postprocessing of the data returned by filesystems, it's really up to the filesystem to prepare the data. > Signed-off-by: Qu Wenruo > --- > v2: > Since fiemap_extent_info has a limit for number of fiemap_extent, it's > possible > that fiemap_fill_next_extent() return 1 halfway. Remove the WARN_ON() which > can > cause kernel warning if we fiemap is called on large compressed file. > v3: > Rename finish_fiemap_extent() to check_fiemap_extent(), as in v3 we ensured > submit_fiemap_extent() to submit fiemap cache, so it just acts as a > sanity check. > Remove BTRFS_MAX_EXTENT_SIZE limit in submit_fiemap_extent(), as > extent map can be larger than BTRFS_MAX_EXTENT_SIZE. > Don't do backward jump, suggested by David. > Better sanity check and recoverable fix. > > To David: > What about adding a btrfs_debug_warn(), which will only call WARN_ON(1) if > BTRFS_CONFIG_DEBUG is specified for recoverable bug? > > And modify ASSERT() to always WARN_ON() and exit error code? That's for a separate discussion. > --- > fs/btrfs/extent_io.c | 124 > ++- > 1 file changed, 122 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index 28e8192..c4cb65d 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -4353,6 +4353,123 @@ static struct extent_map > *get_extent_skip_holes(struct inode *inode, > return NULL; > } > > +/* > + * To cache previous fiemap extent > + * > + * Will be used for merging fiemap extent > + */ > +struct fiemap_cache { > + u64 offset; > + u64 phys; > + u64 len; > + u32 f
Re: Btrfs disk layout question
12.04.2017 14:20, Austin S. Hemmelgarn пишет: > On 2017-04-12 00:18, Chris Murphy wrote: >> On Tue, Apr 11, 2017 at 3:00 PM, Adam Borowski >> wrote: >>> On Tue, Apr 11, 2017 at 12:15:32PM -0700, Amin Hassani wrote: I am working on a project with Btrfs and I was wondering if there is any way to see the disk layout of the btrfs image. Let's assume I have a read-only btrfs image with compression on and only using one disk (no raid or anything). Is it possible to get a set of offset-lengths for each file >>> >>> While btrfs-specific ioctls give more information, you might want to >>> look at >>> FIEMAP (Documentation/filesystems/fiemap.txt) as it works on most >>> filesystems, not just btrfs. One interface to FIEMAP is provided in >>> "/usr/sbin/filefrag -v". >> >> Good idea. Although, on Btrfs I'm pretty sure it reports the Btrfs >> (internal) logical addressing; not the actual physical sector address >> on the drive. So it depends on what the original poster is trying to >> discover. >> > That said, there is a tool to translate that back, and depending on how > detailed you want to get, that may be more efficient than debug tree. Could you give pointer to this tool? I use filefrag on bootinfoscript to display physical disk offset of files of interest to bootloader. I was not aware it shows logical offset which makes it kinda pointless. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs disk layout question
Hi, Thanks for responses. I actually need the physical addresses. FIEMAP I believe (and I tested) gives logical address which as Andrei mentioned is useless. I'm assuming btfs-debug-tree gives the physical addresses right? I also need to know the compression method used on each extent and I don't think I can get that with the fiemap stuff. It seems that fiemap capability is not implemented in Btrfs as I'm looking at the Btrfs implementation and based on documentation of fiemap: "File systems wishing to support fiemap must implement a ->fiemap callback on their inode_operations structure. The fs ->fiemap call is responsible for defining its set of supported fiemap flags, and calling a helper function on each discovered extent:" Thanks, Amin. On Wed, Apr 12, 2017 at 9:44 AM, Andrei Borzenkov wrote: > 12.04.2017 14:20, Austin S. Hemmelgarn пишет: >> On 2017-04-12 00:18, Chris Murphy wrote: >>> On Tue, Apr 11, 2017 at 3:00 PM, Adam Borowski >>> wrote: On Tue, Apr 11, 2017 at 12:15:32PM -0700, Amin Hassani wrote: > I am working on a project with Btrfs and I was wondering if there is > any way to see the disk layout of the btrfs image. Let's assume I have > a read-only btrfs image with compression on and only using one disk > (no raid or anything). Is it possible to get a set of offset-lengths > for each file While btrfs-specific ioctls give more information, you might want to look at FIEMAP (Documentation/filesystems/fiemap.txt) as it works on most filesystems, not just btrfs. One interface to FIEMAP is provided in "/usr/sbin/filefrag -v". >>> >>> Good idea. Although, on Btrfs I'm pretty sure it reports the Btrfs >>> (internal) logical addressing; not the actual physical sector address >>> on the drive. So it depends on what the original poster is trying to >>> discover. >>> >> That said, there is a tool to translate that back, and depending on how >> detailed you want to get, that may be more efficient than debug tree. > > Could you give pointer to this tool? I use filefrag on bootinfoscript to > display physical disk offset of files of interest to bootloader. I was > not aware it shows logical offset which makes it kinda pointless. -- Amin Hassani. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs disk layout question
On 2017-04-12 12:44, Andrei Borzenkov wrote: 12.04.2017 14:20, Austin S. Hemmelgarn пишет: On 2017-04-12 00:18, Chris Murphy wrote: On Tue, Apr 11, 2017 at 3:00 PM, Adam Borowski wrote: On Tue, Apr 11, 2017 at 12:15:32PM -0700, Amin Hassani wrote: I am working on a project with Btrfs and I was wondering if there is any way to see the disk layout of the btrfs image. Let's assume I have a read-only btrfs image with compression on and only using one disk (no raid or anything). Is it possible to get a set of offset-lengths for each file While btrfs-specific ioctls give more information, you might want to look at FIEMAP (Documentation/filesystems/fiemap.txt) as it works on most filesystems, not just btrfs. One interface to FIEMAP is provided in "/usr/sbin/filefrag -v". Good idea. Although, on Btrfs I'm pretty sure it reports the Btrfs (internal) logical addressing; not the actual physical sector address on the drive. So it depends on what the original poster is trying to discover. That said, there is a tool to translate that back, and depending on how detailed you want to get, that may be more efficient than debug tree. Could you give pointer to this tool? I use filefrag on bootinfoscript to display physical disk offset of files of interest to bootloader. I was not aware it shows logical offset which makes it kinda pointless. Looking again, I think I was thinking of `btrfs inspect-internal logical-resolve`, which actually is more like a reverse fiemap (you give it a logical address, and it spits out paths to all the files that include that logical address), so such a tool may not actually exist (at least, not in the standard tools). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs disk layout question
btrfs-map-logical is the tool that will convert logical to physical and also give what device it's on; but the device notation is copy 1 and copy 2, so you have to infer what device that is, it's not explicit. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs disk layout question
On 04/11/2017 09:15 PM, Amin Hassani wrote: > > I am working on a project with Btrfs and I was wondering if there is > any way to see the disk layout of the btrfs image. Let's assume I have > a read-only btrfs image with compression on and only using one disk > (no raid or anything). > Is it possible to get a set of offset-lengths > for each file or metadata parts of the image. These are two very different things, and it's unclear to me what you actually want. Do you want: 1. a layout of physical disk space, and then for each range see if it's used for data, metadata or not used? 2. a list of files and how they're split up (or not) in one or multiple extents, and how long those are? Remember that multiple files can reuse part of each others data in btrfs. So if you follow the files, and you have reflinked copies or subvolume snapshots, then you see actual disk usage multiple times. > I know there is an > unfinished documentation for On-disk Formant in here: > https://btrfs.wiki.kernel.org/index.php/On-disk_Format > But it is not complete and does not show what I am looking for. Is > there any other documentation on this? Is there any public API that I > can use to get this information. ... > For example can I iterate on all > files starting from the root node and get all offset-lengths? This way > any part that doesn't come can be assumed as metadata. I don't really > care what is inside the metadata, I just want to know their > offset-lengths in the file system. No, that's not how it works. To learn more about how btrfs organizes data internally, you need a good understanding of these concepts: * how btrfs allocates "chunks" (often 256MiB or 1GiB size) of raw disk space and dedicate them to either data or metadata. * how btrfs uses a "virtual address space" and how that maps back from (dev tree) and forth (chunk tree) to raw physical disk space on either of the disks that is attached to the filesystem. * how btrfs stores the administration of exactly with part in that virtual address space is in use (extent tree). * how btrfs stores files and directories, and how it does so for multiple directory trees (subvolumes), (the fs tree and all 256 <= trees <= -256). * how files in these file trees reference data from data extents. * how extents reference back to which (can be multiple!) files they're used in. IOW, there are likely multiple levels of indirection that you need to follow to find things out. Currently there's no perfect tutorial that explains exactly all those things in a nice way. The btrfs wiki can help with this, the btrfs-heatmap tool which was already meantioned is nice to play around with, and get a better understanding of all address space and usage. If you know exactly what the end result would be, then it's probably possible to build something that uses the SEARCH IOCTL with which you can search in all metadata (containing info of above mentioned trees) of a live filesystem. At least for C and for python there's enough example code around to do so. -- Hans van Kranenburg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/9] nowait aio: return on congested block device
On 04/12/2017 03:36 AM, Christoph Hellwig wrote: > As mentioned last time around, this should be a REQ_NOWAIT flag so > that it can be easily passed dow? n to the request layer. > >> +static inline void bio_wouldblock_error(struct bio *bio) >> +{ >> +bio->bi_error = -EAGAIN; >> +bio_endio(bio); >> +} > > Please skip this helper.. Why? It is being called three times? I am incorporating all the rest of the comments, besides this one. Thanks. > >> +#define QUEUE_FLAG_NOWAIT 28 /* queue supports BIO_NOWAIT */ > > Please make the flag name a little more descriptive, this sounds like > it will never wait. > -- Goldwyn -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fstests: regression test for btrfs dio read repair
On Wed, Apr 12, 2017 at 10:42:47AM +0100, Filipe Manana wrote: > On Wed, Apr 12, 2017 at 2:27 AM, Liu Bo wrote: > > This case tests whether dio read can repair the bad copy if we have > > a good copy. > > Regardless of being a test we should have always had (thanks for > this!), it would be useful to mention we had a regression (as the test > description in the btrfs/140 file says) and which patch fixed it (and > possibly which kernel version or patch/commit introduced the > regression). > Sure, thanks for the review. > Just a comment/question below. > > > > > Signed-off-by: Liu Bo > > --- > > tests/btrfs/140 | 152 > > > > tests/btrfs/140.out | 39 ++ > > tests/btrfs/group | 1 + > > 3 files changed, 192 insertions(+) > > create mode 100755 tests/btrfs/140 > > create mode 100644 tests/btrfs/140.out > > > > diff --git a/tests/btrfs/140 b/tests/btrfs/140 > > new file mode 100755 > > index 000..db56123 > > --- /dev/null > > +++ b/tests/btrfs/140 > > @@ -0,0 +1,152 @@ > > +#! /bin/bash > > +# FS QA Test 140 > > +# > > +# Regression test for btrfs DIO read's repair during read. > > +# > > +#--- > > +# Copyright (c) 2017 Liu Bo. All Rights Reserved. > > +# > > +# This program is free software; you can redistribute it and/or > > +# modify it under the terms of the GNU General Public License as > > +# published by the Free Software Foundation. > > +# > > +# This program is distributed in the hope that it would be useful, > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > +# GNU General Public License for more details. > > +# > > +# You should have received a copy of the GNU General Public License > > +# along with this program; if not, write the Free Software Foundation, > > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > > +#--- > > +# > > + > > +seq=`basename $0` > > +seqres=$RESULT_DIR/$seq > > +echo "QA output created by $seq" > > + > > +here=`pwd` > > +tmp=/tmp/$$ > > +status=1 # failure is the default! > > +trap "_cleanup; exit \$status" 0 1 2 3 15 > > + > > +_cleanup() > > +{ > > + cd / > > + rm -f $tmp.* > > +} > > + > > +# get standard environment, filters and checks > > +. ./common/rc > > +. ./common/filter > > + > > +# remove previous $seqres.full before test > > +rm -f $seqres.full > > + > > +# real QA test starts here > > + > > +# Modify as appropriate. > > +_supported_fs btrfs > > +_supported_os Linux > > +_require_scratch_dev_pool 2 > > +_require_command "$BTRFS_MAP_LOGICAL_PROG" btrfs-map-logical > > +_require_command "$FILEFRAG_PROG" filefrag > > +_require_odirect > > + > > +# helpe to convert 'file offset' to btrfs logical offset > > +FILEFRAG_FILTER=' > > + if (/blocks? of (\d+) bytes/) { > > + $blocksize = $1; > > + next > > + } > > + ($ext, $logical, $physical, $length) = > > + (/^\s*(\d+):\s+(\d+)..\s+\d+:\s+(\d+)..\s+\d+:\s+(\d+):/) > > + or next; > > + ($flags) = /.*:\s*(\S*)$/; > > + print $physical * $blocksize, "#", > > + $length * $blocksize, "#", > > + $logical * $blocksize, "#", > > + $flags, " "' > > + > > +# this makes filefrag output script readable by using a perl helper. > > +# output is one extent per line, with three numbers separated by '#' > > +# the numbers are: physical, length, logical (all in bytes) > > +# sample output: "1234#10#5678" -> physical 1234, length 10, logical 5678 > > +_filter_extents() > > +{ > > + tee -a $seqres.full | $PERL_PROG -ne "$FILEFRAG_FILTER" > > +} > > + > > +_check_file_extents() > > +{ > > + cmd="filefrag -v $1" > > + echo "# $cmd" >> $seqres.full > > + out=`$cmd | _filter_extents` > > + if [ -z "$out" ]; then > > + return 1 > > + fi > > + echo "after filter: $out" >> $seqres.full > > + echo $out > > + return 0 > > +} > > + > > +_check_repair() > > +{ > > + filter=${1:-cat} > > + dmesg | tac | sed -ne "0,\#run fstests $seqnum at $date_time#p" | > > tac | $filter | grep -q -e "csum failed" > > + if [ $? -eq 0 ]; then > > + echo 1 > > + else > > + echo 0 > > + fi > > +} > > + > > +_scratch_dev_pool_get 2 > > +# step 1, create a raid1 btrfs which contains one 128k file. > > +echo "step 1..mkfs.btrfs" >>$seqres.full > > + > > +mkfs_opts="-d raid1" > > +_scratch_pool_mkfs $mkfs_opts >>$seqres.full 2>&1 > > + > > +_scratch_mount -o nospace_cache > > Why do we need to mount without space cache? > I don't see why nor I think it's obvious. A comment in the test > mentioning why would be useful for everyone. > Thanks for spotting it, we can safely get rid of it,
Re: BTRFS as a GlusterFS storage back-end, and what I've learned from using it as such.
Austin S. Hemmelgarn posted on Wed, 12 Apr 2017 07:18:44 -0400 as excerpted: > On 2017-04-12 01:49, Qu Wenruo wrote: >> >> At 04/11/2017 11:40 PM, Austin S. Hemmelgarn wrote: >>> >>> 4. Depending on other factors, compression can actually slow you down >>> pretty significantly. In the particular case I saw this happen (all >>> cores completely utilized by userspace software), LZO compression >>> actually caused around 5-10% performance degradation compared to no >>> compression. This is somewhat obvious once it's explained, but it's >>> not exactly intuitive and as such it's probably worth documenting in >>> the man pages that compression won't always make things better. I may >>> send a patch to add this at some point in the near future. >> >> This seems interesting. >> Maybe it's CPU limiting the performance? > In this case, I'm pretty certain that that's the cause. I've only ever > seen this happen though when the CPU was under either full or more than > full load (so pretty much full utilization of all the cores), and it > gets worse as the CPU load increases. This seems blatantly obvious to me, no explanation needed, at least assuming people understand what compression is and does. It certainly doesn't seem btrfs specific to me. Which makes my wonder if I'm missing something that would seem to counteract the obvious, but doesn't in this case. Compression at its most basic can be described as a tradeoff of CPU cycles to decrease data size (by tracking and eliminating internal redundancy), and thus transfer time of the data. In conditions where the bottleneck is (seek and) transfer time, as on hdds with mostly idle CPUs, compression therefore tends to be a pretty big performance boost because the lower size of the compressed data means fewer seeks and lower transfer time, and because that's where the bottleneck is, making it more efficient increases the performance of the entire thing. But the context here is SSDs, with 0 seek time and fast transfer speeds, and already 100% utilized CPUs, so the bottleneck is the 100% utilized CPUs and the increased CPU cycles necessary for the compression/ decompression simply increases the CPU bottleneck. So far from a mystery, this seems so basic to me that the simplest dunderhead should get it, at least as long as they aren't /so/ simple they can't understand the tradeoff inherent in the simplest compression basics. But that's not the implication of the discussion quoted above, and the participants are both what I'd consider far more qualified to understand and deal with this sort of thing than I, so I /gotta/ be missing something that despite my correct ultimate conclusion, means I haven't reached it using a correct logic train, and that there /must/ be some logic steps in there that I've left out that would intuitively switch the logic, making this a rather less intuitive conclusion than I'm thinking. So what am I missing? Or is it simply that the tradeoff between CPU usage and data size and minimum transit time isn't as simple and basic for most people as I'm assuming here, such that it isn't obviously giving more work to an already bottlenecked CPU, reducing the performance when it /is/ the CPU that's bottlenecked? -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs-progs: send-dump: always print a space after path
Evan Danaher posted on Tue, 11 Apr 2017 12:33:40 -0400 as excerpted: > I was shocked to discover that 'btrfs receive --dump' doesn't print a > space after long filenames, so it runs together into the metadata; for > example: > > truncate./20-00-03/this-name-is-32-characters-longsize=0 > > This is a trivial patch to add a single space unconditionally, so the > result is the following: > > truncate./20-00-03/this-name-is-32-characters-long size=0 > > I suppose this is technically a breaking change, but it seems unlikely > to me that anyone would depend on the existing behavior given how > unfriendly it is. > > Signed-off-by: Evan Danaher > --- I'm not a dev so won't attempt to comment on the patch itself, but it's worth noting that according to kernel patch submission guidelines (which btrfs-progs use as well) on V2+ patch postings, there should be a short, often one-line per version, summary of what changed between versions. This helps both reviewers and would-be patch-using admins such as myself understand how a patch is evolving, as well as for reviewers preventing unnecessary work when re-reviewing a new version of a patch previously reviewed in an earlier version. On patch series this summary is generally found in the 0/N post, while on individual patches without a 0/N, it's normally found below the first --- delimiter, so as to avoid including the patch history in the final merged version comment. See pretty much any other multi-version posted patch for examples. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] btrfs: fiemap: Cache and merge fiemap extent before submit it to user
At 04/12/2017 11:05 PM, David Sterba wrote: On Fri, Apr 07, 2017 at 10:43:15AM +0800, Qu Wenruo wrote: [BUG] Cycle mount btrfs can cause fiemap to return different result. Like: # mount /dev/vdb5 /mnt/btrfs # dd if=/dev/zero bs=16K count=4 oflag=dsync of=/mnt/btrfs/file # xfs_io -c "fiemap -v" /mnt/btrfs/file /mnt/test/file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]:25088..25215 128 0x1 # umount /mnt/btrfs # mount /dev/vdb5 /mnt/btrfs # xfs_io -c "fiemap -v" /mnt/btrfs/file /mnt/test/file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..31]: 25088..2511932 0x0 1: [32..63]:25120..2515132 0x0 2: [64..95]:25152..2518332 0x0 3: [96..127]: 25184..2521532 0x1 But after above fiemap, we get correct merged result if we call fiemap again. # xfs_io -c "fiemap -v" /mnt/btrfs/file /mnt/test/file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]:25088..25215 128 0x1 [REASON] Btrfs will try to merge extent map when inserting new extent map. btrfs_fiemap(start=0 len=(u64)-1) |- extent_fiemap(start=0 len=(u64)-1) |- get_extent_skip_holes(start=0 len=64k) | |- btrfs_get_extent_fiemap(start=0 len=64k) | |- btrfs_get_extent(start=0 len=64k) || Found on-disk (ino, EXTENT_DATA, 0) ||- add_extent_mapping() ||- Return (em->start=0, len=16k) | |- fiemap_fill_next_extent(logic=0 phys=X len=16k) | |- get_extent_skip_holes(start=0 len=64k) | |- btrfs_get_extent_fiemap(start=0 len=64k) | |- btrfs_get_extent(start=16k len=48k) || Found on-disk (ino, EXTENT_DATA, 16k) ||- add_extent_mapping() || |- try_merge_map() || Merge with previous em start=0 len=16k || resulting em start=0 len=32k ||- Return (em->start=0, len=32K)<< Merged result |- Stripe off the unrelated range (0~16K) of return em |- fiemap_fill_next_extent(logic=16K phys=X+16K len=16K) ^^^ Causing split fiemap extent. And since in add_extent_mapping(), em is already merged, in next fiemap() call, we will get merged result. [FIX] Here we introduce a new structure, fiemap_cache, which records previous fiemap extent. And will always try to merge current fiemap_cache result before calling fiemap_fill_next_extent(). Only when we failed to merge current fiemap extent with cached one, we will call fiemap_fill_next_extent() to submit cached one. So by this method, we can merge all fiemap extents. The cache gets reset on each call to extent_fiemap, so if fi_extents_max is 1, the cache will be always unset and we'll never merge anything. The same can happen if the number of extents reaches the limit (FIEMAP_MAX_EXTENTS or any other depending on the ioctl caller). And this leads to the unmerged extents. Nope, extents will still be merged, as long as they can be merged. The fiemap extent is only submitted if we found an unmergeable extent. Even fi_extents_max is 1, it still possible for us to merge extents. File A: Extent 1: offset=0 len=4k phys=X Extent 2: offset=4k len=4k phys=X+4 Extent 3: offset=8k len=4k phys=Y 1) Found Extent 1 Cache it, not submitted yet. 2) Found Extent 2 Merge it with cached one, not submitted yet. 3) Found Extent 3 Can't merge, submit cached first. Submitted one reach fi_extents_max, exit current extent_fiemap. 4) Next fiemap call starts from offset 8K, Extent 3 is the last extent, no need to cache just submit. So we still got merged fiemap extent, without anything wrong. The point is, fi_extents_max or other limit can only be merged when we submit fiemap_extent, in that case either we found unmergable extent, or we already hit the last extent. It can also be done in fs/ioctl.c, however the problem is if fieinfo->fi_extents_max == 0, we have no space to cache previous fiemap extent. I don't see why, it's the same code path, no? My original design in VFS is to check if we can merge current fiemap extent with the last one in fiemap_info. But for fi_extents_max == 0 case, fiemap_info doesn't store any extent so that's not possible. So for fi_extents_max == 0 case, either do it in each fs like what we are doing, or introduce a new function like fiemap_cache_next_extent() with reference to cached structure. So I choose to merge it in btrfs. Lifting that to the vfs interface is probably not the right approach. The ioctl has never done any postprocessing of the data returned by filesystems, it's really up to the filesystem to prepare the data. OK, let's keep it in btrfs. Signed-off-by: Qu Wenruo --- v2: Since fiemap_extent_info has a limit for number of fiemap_extent, it's possible that fiemap_fill_next_extent() return 1 halfway. Remove the WARN_ON() which can cause kernel warning if we fiemap is called on large c
Re: [PATCH] fstests: introduce btrfs-map-logical
At 04/12/2017 08:52 PM, David Sterba wrote: On Wed, Apr 12, 2017 at 02:32:02PM +0200, David Sterba wrote: On Wed, Apr 12, 2017 at 09:35:00AM +0800, Qu Wenruo wrote: At 04/12/2017 09:27 AM, Liu Bo wrote: A typical use case of 'btrfs-map-logical' is to translate btrfs logical address to physical address on each disk. Could we avoid usage of btrfs-map-logical here? Agreed. I understand that we need to do corruption so that we can test if the repair works, but I'm not sure if the output format will change, or if the program will get replace by "btrfs inspect-internal" group. In the long-term it will be repleaced, but there's no ETA. Possibly, if fstests maintainer agrees, we can add btrfs-map-logical to fstests. It's small and uses headers from libbtrfs, so this would become a new dependency but I believe is still bearable. I'm not sure if we should export all debuging functionality in 'btrfs' as this is typically something that a user will never want, not even in the emergency environments. There's an overlap in the information to be exported but I'd be more inclined to satisfy user needs than testsuite needs. So an independent tool would give us more freedom on both sides. I'm working on the new btrfs-corrupt-block equivalent, considering the demand to corrupt on-disk data for recovery test, I could provide tool with fundamental corruption support. Which could corrupt on-disk data, either specified by (root, inode, offset, length) or just (logical address, length). And support to corrupt given mirror or even P/Q for RAID56. (With btrfs_map_block_v2 from offline scrub) I'm not sure if I should just replace btrfs-corrupt-block or add a new individual prog or add a btrfs subcommand group which is disabled by default? Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: send snapshot from snapshot incremental
Am 2017-03-26 um 22:07 schrieb Peter Grandi: > [ ... ] >> BUT if i take a snapshot from the system, and want to transfer >> it to the external HD, i can not set a parent subvolume, >> because there isn't any. > > Questions like this are based on incomplete understanding of > 'send' and 'receive', and on IRC user "darkling" explained it > fairly well: > >> When you use -c, you're telling the FS that it can expect to >> find a sent copy of that subvol on the receiving side, and >> that anything shared with it can be sent by reference. OK, so >> with -c on its own, you're telling the FS that "all the data >> in this subvol already exists on the remote". > >> So, when you send your subvol, *all* of the subvol's metadata >> is sent, and where that metadata refers to an extent that's >> shared with the -c subvol, the extent data isn't sent, because >> it's known to be on the other end already, and can be shared >> directly from there. > >> OK. So, with -p, there's a "base" subvol. The send subvol and >> the -p reference subvol are both snapshots of that base (at >> different times). The -p reference subvol, as with -c, is >> assumed to be on the remote FS. However, because it's known to >> be an earlier version of the same data, you can be more >> efficient in the sending by saying "start from the earlier >> version, and modify it in this way to get the new version" > >> So, with -p, not all of the metadata is sent, because you know >> you've already got most of it on the remote in the form of the >> earlier version. > >> So -p is "take this thing and apply these differences to it" >> and -c is "build this thing from scratch, but you can share >> some of the data with these sources" > For now, I think i got it... (maybe). I put the following logic into my script: 1) Search for all Subvolumes on local and remote side, where the Received-UUID on the remote side is the same as the UUID on the local side 2) Take the parent-UUID from the Snapshot i want to transfer and search in the list from 1) which snapshot (from the local side) has the same parent UUID. 3) Take the younges Snapshot from 2) ans set it as parent for the btrfs send-command 4) Search for snapshot local and remote, wich have the same name|path ans "basename" as the snapshot i want to transfer basename means, my system-subvolume is called @debian it contains one subvolume @debian/var/spool the snapshotnames are @debian_$TIMESTAMP and @debian_$TIMESTAMP/var/spool The basename is @debian and @debian/var/spool 5) set all of the snapshots with the same basename as the snapshot to be transferred as clones for btrfs send. The final command involves the youngest "sister" from the snapshot i want to transfer, which is on both sides, set as "parent", and a bunch of snapshots wich are older (or even younger - is this a problem???) than the snapshot i want to transfer wich contain modified and the same data, set as clones If there is no parent (In case of transferring a snapshot of a snapshot...) then there are clones of this snapshot, so not all of the data is to be sent again (and consumes the double space on the backup-media) If there are no parents AND clones (similar snapshots), the subvolume seems to be totally new, and the whole must be transferred. If there is a parent and clones, both of them are used to minimize the data for the transfer, and use as much as possible from the existing data/metadata on the backup-media to build the new snapshot there To use all of the similar snapshots (get by the snapshotname) as clones seems to fasten the transfer in comparison to only use the parent (this seems slower). Could this be, or is this only a "feeling? Thanks for all your advices. This helped me a lot!! regards Jakob -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs disk layout question
12.04.2017 20:21, Chris Murphy пишет: > btrfs-map-logical is the tool that will convert logical to physical > and also give what device it's on; but the device notation is copy 1 > and copy 2, so you have to infer what device that is, it's not > explicit. > Quickly checking output - for my purposes it looks OK, as BIS just tries to warn if file is too far to be accessible by BIOS, so I am not even interested in specific device, just max physical offset. Thank you! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fstests: introduce btrfs-map-logical
On Wed, Apr 12, 2017 at 02:52:23PM +0200, David Sterba wrote: > > > I understand that we need to do corruption so that we can test if the > > > repair works, but I'm not sure if the output format will change, or if > > > the program will get replace by "btrfs inspect-internal" group. > > > > In the long-term it will be repleaced, but there's no ETA. > > Possibly, if fstests maintainer agrees, we can add btrfs-map-logical to > fstests. It's small and uses headers from libbtrfs, so this would become > a new dependency but I believe is still bearable. IMHO, I think the ability to poke btrfs internal really should be provided by btrfs-progs package and maintained by btrfs community. fstests provides some fs-independent c helpers to assist testing, but not necessarily needs to "understand" filesystem internals. For historical reason, building fstests requires xfsprogs development headers, we'd better not introduce new fs-specific dependencies. Thanks, Eryu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html