Re: [PATCH] fstests: generic test for fsync after file truncations
On Thu, Jun 25, 2015 at 04:18:15AM +0100, fdman...@kernel.org wrote: > From: Filipe Manana > > Test that if we truncate a file to a smaller size, then truncate it to > its original size or a larger size, then fsyncing it and a power failure > happens, the file will have the range [first_truncate_size, last_size[ > with all bytes having a value of 0x00 if we read it the next time the > filesystem is mounted. > > This test is motivated by a bug found in btrfs, which is fixed by a patch > titled: "Btrfs: fix fsync after truncate when no_holes feature is enabled" > > Tested against ext3/4, xfs, btrfs (with and without the fix, and with the > no_holes feature disabled), f2fs, reiserfs and nilfs2. > > All filesystems pass the test except for unpatched btrfs with the > no_holes feature enabled (as expected) and f2fs. Both produce the > following file contents that differ from the golden output: > > File foo content after log replay: > 000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa > * > 020 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb > * > 0372000 > File bar content after log replay: > 000 ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee > * > 020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > * > 0372000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > * > 0772000 > > Signed-off-by: Filipe Manana > --- > tests/generic/095 | 116 > ++ > tests/generic/095.out | 19 + > tests/generic/group | 1 + > 3 files changed, 136 insertions(+) > create mode 100755 tests/generic/095 > create mode 100644 tests/generic/095.out > > diff --git a/tests/generic/095 b/tests/generic/095 > new file mode 100755 > index 000..bfd4112 > --- /dev/null > +++ b/tests/generic/095 > @@ -0,0 +1,116 @@ > +#! /bin/bash > +# FSQA Test No. 095 > +# > +# Test that if we truncate a file to a smaller size, then truncate it to its > +# original size or a larger size, then fsyncing it and a power failure > happens, > +# the file will have the range [first_truncate_size, last_size[ with all > bytes > +# having a value of 0x00 if we read it the next time the filesystem is > mounted. > +# > +# This test is motivated by a bug found in btrfs. > +# > +#--- > +# > +# Copyright (C) 2015 SUSE Linux Products GmbH. All Rights Reserved. > +# Author: Filipe Manana > +# > +# This program is free software; you can redistribute it and/or > +# modify it under the terms of the GNU General Public License as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it would be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write the Free Software Foundation, > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > +#--- > +# > + > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > +tmp=/tmp/$$ > +status=1 # failure is the default! > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > + _cleanup_flakey > + rm -f $tmp.* > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > +. ./common/dmflakey > + > +# real QA test starts here > +_need_to_be_root > +_supported_fs generic > +_supported_os Linux > +_require_scratch > +_require_dm_flakey Need _require_metadata_journaling here. Otherwise looks good to me. Adding _require_metadata_journaling then Reviewed-by: Eryu Guan > + > +# This test was motivated by an issue found in btrfs when the btrfs no-holes > +# feature is enabled (introduced in kernel 3.14). So enable the feature if > the > +# fs being tested is btrfs. > +if [ $FSTYP == "btrfs" ]; then > + _require_btrfs_fs_feature "no_holes" > + _require_btrfs_mkfs_feature "no-holes" > + MKFS_OPTIONS="$MKFS_OPTIONS -O no-holes" > +fi > + > +rm -f $seqres.full > + > +_scratch_mkfs >>$seqres.full 2>&1 > +_init_flakey > +_mount_flakey > + > +# Create our test files and make sure everything is durably persisted. > +$XFS_IO_PROG -f -c "pwrite -S 0xaa 0 64K" \ > + -c "pwrite -S 0xbb 64K 61K" \ > + $SCRATCH_MNT/foo | _filter_xfs_io > +$XFS_IO_PROG -f -c "pwrite -S 0xee 0 64K" \ > + -c "pwrite -S 0xff 64K 61K" \ > + $SCRATCH_MNT/bar | _filter_xfs_io > +sync > + > +# Now truncate our file foo to a smaller size (64Kb) and then truncate it to > the > +# size it had before the shrinking truncate (125Kb). Then fsync our file. If > a > +# power failure happens after the fsync, we expect our file to have a size of >
Re: [PATCH] fstests: generic test for truncating a file into the middle of a hole
On Thu, Jun 25, 2015 at 04:17:59AM +0100, fdman...@kernel.org wrote: > From: Filipe Manana > > Test that after truncating a file into the middle of a hole causes the > new size of the file to be persisted after a clean unmount of the > filesystem (or after the inode is evicted). This is for the case where > all the data following the hole is not yet durably persisted, that is, > that data is only present in the page cache. > > This test is motivated by an issue found in btrfs, which got fixed by > the patch titled: > > "Btrfs: fix shrinking truncate when the no_holes feature is enabled" > > Signed-off-by: Filipe Manana Looks good to me. Test failed on btrfs as expect and passed on extN/xfs/nfs/cifs. Reviewed-by: Eryu Guan > --- > tests/generic/094 | 91 > +++ > tests/generic/094.out | 11 +++ > tests/generic/group | 1 + > 3 files changed, 103 insertions(+) > create mode 100755 tests/generic/094 > create mode 100644 tests/generic/094.out > > diff --git a/tests/generic/094 b/tests/generic/094 > new file mode 100755 > index 000..0876eb7 > --- /dev/null > +++ b/tests/generic/094 > @@ -0,0 +1,91 @@ > +#! /bin/bash > +# FSQA Test No. 094 > +# > +# Test that after truncating a file into the middle of a hole causes the new > +# size of the file to be persisted after a clean unmount of the filesystem > (or > +# after the inode is evicted). This is for the case where all the data > following > +# the hole is not yet durably persisted, that is, that data is only present > in > +# the page cache. > +# > +# This test is motivated by an issue found in btrfs. > +# > +#--- > +# > +# Copyright (C) 2015 SUSE Linux Products GmbH. All Rights Reserved. > +# Author: Filipe Manana > +# > +# This program is free software; you can redistribute it and/or > +# modify it under the terms of the GNU General Public License as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it would be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write the Free Software Foundation, > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > +#--- > +# > + > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > +tmp=/tmp/$$ > +status=1 # failure is the default! > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > + rm -f $tmp.* > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > + > +# real QA test starts here > +_need_to_be_root > +_supported_fs generic > +_supported_os Linux > +_require_scratch > + > +# This test was motivated by an issue found in btrfs when the btrfs no-holes > +# feature is enabled (introduced in kernel 3.14). So enable the feature if > the > +# fs being tested is btrfs. > +if [ $FSTYP == "btrfs" ]; then > + _require_btrfs_fs_feature "no_holes" > + _require_btrfs_mkfs_feature "no-holes" > + MKFS_OPTIONS="$MKFS_OPTIONS -O no-holes" > +fi > + > +rm -f $seqres.full > + > +_scratch_mkfs >>$seqres.full 2>&1 > +_scratch_mount > + > +# Create our test file with some data and durably persist it. > +$XFS_IO_PROG -f -c "pwrite -S 0xaa 0 128K" $SCRATCH_MNT/foo | _filter_xfs_io > +sync > + > +# Append some data to the file, increasing its size, and leave a hole between > +# the old size and the start offset if the following write. So our file gets > +# a hole in the range [128Kb, 256Kb[. > +$XFS_IO_PROG -c "pwrite -S 0xbb 256K 32K" $SCRATCH_MNT/foo | _filter_xfs_io > + > +# Now truncate our file to a smaller size that is in the middle of the hole > we > +# previously created. On most truncate implementations the data we appended > +# before gets discarded from memory (with truncate_setsize()) and never ends > +# up being written to disk. > +$XFS_IO_PROG -c "truncate 160K" $SCRATCH_MNT/foo > + > +_scratch_remount > + > +# We expect to see a file with a size of 160Kb, with the first 128Kb of data > all > +# having the value 0xaa and the remaining 32Kb of data all having the value > 0x00 > +echo "File content after remount:" > +od -t x1 $SCRATCH_MNT/foo > + > +status=0 > +exit > diff --git a/tests/generic/094.out b/tests/generic/094.out > new file mode 100644 > index 000..47d1b63 > --- /dev/null > +++ b/tests/generic/094.out > @@ -0,0 +1,11 @@ > +QA output created by 094 > +wrote 131072/131072 bytes at offset 0 > +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +wrote 32768/32768 bytes at offset 262144 > +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +File content
Re: [PATCH 0/2] btrfs device remove alias
Anand Jain posted on Fri, 26 Jun 2015 09:10:56 +0800 as excerpted: > while on this. its also good idea to create alias for > >btrfs replace start -> btrfs device replace. > > any comments ? That's actually the one that makes more sense to me. Delete/remove/ subtract, all about the same to me, so while I'm not opposed to alias for that, I don't really see the need. But with btrfs device add/remove, having btrfs replace instead of btrfs device replace, makes absolutely no sense to me, so I'm all for that alias. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: Improve FL_KEEP_SIZE handling in fallocate.
On Mon, Apr 06, 2015 at 10:09:15PM -0700, Davide Italiano wrote: > - We call inode_size_ok() only if FL_KEEP_SIZE isn't specified. > - As an optimisation we can skip the call if (off + len) > isn't greater than the current size of the file. This operation > is called under the lock so the less work we do, the better. > - If we call inode_size_ok() pass to it the correct value rather > than a more conservative estimation. Looks good. Reviewed-by: Liu Bo > > Signed-off-by: Davide Italiano > --- > fs/btrfs/file.c | 10 +++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index 30982bb..f649bfc 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -2586,9 +2586,13 @@ static long btrfs_fallocate(struct file *file, int > mode, > } > > mutex_lock(&inode->i_mutex); > - ret = inode_newsize_ok(inode, alloc_end); > - if (ret) > - goto out; > + > + if (!(mode & FALLOC_FL_KEEP_SIZE) && > + offset + len > inode->i_size) { > + ret = inode_newsize_ok(inode, offset + len); > + if (ret) > + goto out; > + } > > if (alloc_start > inode->i_size) { > ret = btrfs_cont_expand(inode, i_size_read(inode), > -- > 2.3.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix fsync after truncate when no_holes feature is enabled
On Thu, Jun 25, 2015 at 03:23:59PM +0100, Filipe Manana wrote: > On Thu, Jun 25, 2015 at 3:20 PM, Liu Bo wrote: > > On Thu, Jun 25, 2015 at 04:17:46AM +0100, fdman...@kernel.org wrote: > >> From: Filipe Manana > >> > >> When we have the no_holes feature enabled, if a we truncate a file to a > >> smaller size, truncate it again but to a size greater than or equals to > >> its original size and fsync it, the log tree will not have any information > >> about the hole covering the range [truncate_1_offset, new_file_size[. > >> Which means if the fsync log is replayed, the file will remain with the > >> state it had before both truncate operations. > > > > Does the fs/subvol tree get updated to the right information at this > > time? > > No, and that's the problem. Because no file extent items are stored in > the log tree. > The inode item is updated with the new i_size however (as expected). Yeap, that's right and the patch looks right. I do appreciate your great work on fixing these corner cases, but as of my understanding, they really can be taken by a force commit transaction, do they deserve these complex stuff? After all, like punch_hole, remove xattr, they're rare cases. Thanks, -liubo > > thanks > > > > > Thanks, > > > > -liubo > >> > >> Without the no_holes feature this does not happen, since when the inode > >> is logged (full sync flag is set) it will find in the fs/subvol tree a > >> leaf with a generation matching the current transaction id that has an > >> explicit extent item representing the hole. > >> > >> Fix this by adding an explicit extent item representing a hole between > >> the last extent and the inode's i_size if we are doing a full sync. > >> > >> The issue is easy to reproduce with the following test case for fstests: > >> > >> . ./common/rc > >> . ./common/filter > >> . ./common/dmflakey > >> > >> _need_to_be_root > >> _supported_fs generic > >> _supported_os Linux > >> _require_scratch > >> _require_dm_flakey > >> > >> # This test was motivated by an issue found in btrfs when the btrfs > >> # no-holes feature is enabled (introduced in kernel 3.14). So enable > >> # the feature if the fs being tested is btrfs. > >> if [ $FSTYP == "btrfs" ]; then > >> _require_btrfs_fs_feature "no_holes" > >> _require_btrfs_mkfs_feature "no-holes" > >> MKFS_OPTIONS="$MKFS_OPTIONS -O no-holes" > >> fi > >> > >> rm -f $seqres.full > >> > >> _scratch_mkfs >>$seqres.full 2>&1 > >> _init_flakey > >> _mount_flakey > >> > >> # Create our test files and make sure everything is durably persisted. > >> $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 64K" \ > >> -c "pwrite -S 0xbb 64K 61K" \ > >> $SCRATCH_MNT/foo | _filter_xfs_io > >> $XFS_IO_PROG -f -c "pwrite -S 0xee 0 64K" \ > >> -c "pwrite -S 0xff 64K 61K" \ > >> $SCRATCH_MNT/bar | _filter_xfs_io > >> sync > >> > >> # Now truncate our file foo to a smaller size (64Kb) and then truncate > >> # it to the size it had before the shrinking truncate (125Kb). Then > >> # fsync our file. If a power failure happens after the fsync, we expect > >> # our file to have a size of 125Kb, with the first 64Kb of data having > >> # the value 0xaa and the second 61Kb of data having the value 0x00. > >> $XFS_IO_PROG -c "truncate 64K" \ > >>-c "truncate 125K" \ > >>-c "fsync" \ > >>$SCRATCH_MNT/foo > >> > >> # Do something similar to our file bar, but the first truncation sets > >> # the file size to 0 and the second truncation expands the size to the > >> # double of what it was initially. > >> $XFS_IO_PROG -c "truncate 0" \ > >>-c "truncate 253K" \ > >>-c "fsync" \ > >>$SCRATCH_MNT/bar > >> > >> _load_flakey_table $FLAKEY_DROP_WRITES > >> _unmount_flakey > >> > >> # Allow writes again, mount to trigger log replay and validate file > >> # contents. > >> _load_flakey_table $FLAKEY_ALLOW_WRITES > >> _mount_flakey > >> > >> # We expect foo to have a size of 125Kb, the first 64Kb of data all > >> # having the value 0xaa and the remaining 61Kb to be a hole (all bytes > >> # with value 0x00). > >> echo "File foo content after log replay:" > >> od -t x1 $SCRATCH_MNT/foo > >> > >> # We expect bar to have a size of 253Kb and no extents (any byte read > >> # from bar has the value 0x00). > >> echo "File bar content after log replay:" > >> od -t x1 $SCRATCH_MNT/bar > >> > >> status=0 > >> exit > >> > >> The expected file contents in the golden output are: > >> > >> File foo content after log replay: > >> 000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa > >> * > >> 020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> * > >> 0372000 > >> File bar content after log replay: > >> 000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> * > >> 0772000
Re: [PATCH v2 0/5] Btrfs: RAID 5/6 missing device scrub+replace
Tested-by: Wang Yanfeng On 06/26/2015 12:35 AM, Omar Sandoval wrote: On Thu, Jun 25, 2015 at 01:03:57PM +0800, wangyf wrote: I confirmed this bug report, and found the reason is that I compiled the patched module with a dirty kernel. This morning I tested this patch again, and didn't see above error, this patch is OK. Sorry for this bug report. : ( It's no problem! Do either of you feel like providing your Tested-by? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs partition converted from ext4 becomes read-only minutes after booting: WARNING: CPU: 2 PID: 2777 at ../fs/btrfs/super.c:260 __btrfs_abort_transaction+0x4b/0x120
Qu Wenruo wrote on 2015/06/26 09:54 +0800: Robert Munteanu wrote on 2015/06/12 15:19 +0300: Hi, I have converted my root ext4 partition to btrfs. I used an USB stick to boot and used btrfs-convert. I also did a balance and defrag ( in that order ) , both when the fs was mounted. After logging in to KDE I quickly get a read-only filesystem. I've pasted the backtrace below Jun 11 23:13:08 mars kernel: WARNING: CPU: 2 PID: 2777 at ../fs/btrfs/super.c:260 __btrfs_abort_transaction+0x4b/0x120 [btrfs]() Jun 11 23:13:08 mars kernel: BTRFS: Transaction aborted (error -95) Jun 11 23:13:08 mars kernel: Modules linked in: bnep bluetooth rfkill fuse vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) af_packet nf_log_ipv6 xt_pkttype nf_log_ip v4 nf_log_common xt_LOG xt_limit ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_con ntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables xfs libcrc32c snd_hda _codec_hdmi raid1 md_mod gpio_ich ppdev iTCO_wdt iTCO_vendor_support coretemp snd_hda_codec_realtek snd_hda_codec_generic kvm_intel snd_hda_intel dm_mod kvm snd_hda_co ntroller snd_hda_codec snd_hwdep serio_raw pcspkr snd_pcm i2c_i801 snd_seq joydev snd_seq_device snd_timer snd 8250_fintek parport_pc parport acpi_cpufreq lpc_ich Jun 11 23:13:08 mars kernel: soundcore mfd_core shpchp processor ata_generic btrfs hid_logitech_hidpp xor raid6_pq sr_mod cdrom nvidia_uvm(PO) nvidia(PO) firewire_ohc i firewire_core crc_itu_t uas usb_storage r8169 mii pata_jmicron hid_logitech_dj drm button sg Jun 11 23:13:08 mars kernel: CPU: 2 PID: 2777 Comm: kworker/u8:0 Tainted: P O4.0.4-3-desktop #1 Jun 11 23:13:08 mars kernel: Hardware name: Gigabyte Technology Co., Ltd. EP35-DS4/EP35-DS4, BIOS F6d 01/08/2009 Jun 11 23:13:08 mars kernel: Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] Jun 11 23:13:08 mars kernel: a0a92832 8167c4aa 880128513ca8 Jun 11 23:13:08 mars kernel: 81063bb1 880031929d28 880221e71800 ffa1 Jun 11 23:13:08 mars kernel: a0a914e0 0b50 81063c2a a0a95928 Jun 11 23:13:08 mars kernel: Call Trace: Jun 11 23:13:08 mars kernel: [] dump_trace+0x8c/0x340 Jun 11 23:13:08 mars kernel: [] show_stack_log_lvl+0xa3/0x190 Jun 11 23:13:08 mars kernel: [] show_stack+0x21/0x50 Jun 11 23:13:08 mars kernel: [] dump_stack+0x47/0x67 Jun 11 23:13:08 mars kernel: [] warn_slowpath_common+0x81/0xb0 Jun 11 23:13:08 mars kernel: [] warn_slowpath_fmt+0x4a/0x50 Jun 11 23:13:08 mars kernel: [] __btrfs_abort_transaction+0x4b/0x120 [btrfs] Jun 11 23:13:08 mars kernel: [] btrfs_finish_ordered_io+0x5aa/0x620 [btrfs] Jun 11 23:13:08 mars kernel: [] normal_work_helper+0xc3/0x320 [btrfs] Jun 11 23:13:08 mars kernel: [] process_one_work+0x142/0x420 Jun 11 23:13:08 mars kernel: [] worker_thread+0x114/0x460 Jun 11 23:13:08 mars kernel: [] kthread+0xc1/0xe0 Jun 11 23:13:08 mars kernel: [] ret_from_fork+0x58/0x90 Jun 11 23:13:08 mars kernel: ---[ end trace 4c4eb7d6e98afa91 ]--- Jun 11 23:13:08 mars kernel: BTRFS: error (device sda1) in btrfs_finish_ordered_io:2896: errno=-95 unknown IIRC some one in the mail-list has reported the same bug. Still not sure the root cause but seems highly related to converted fs. It would be much better if you have a clue to trigger the bug. Like read/write which file(s) may cause the bug. If it's OK for you, please upload the btrfs-debug-tree output. WARNING: This output will not contain any data but all your filename/dir name. My first guess is some btrfs codes can't handle the special extent converted from ext* well, but still quite hard to say even the errno(EOPNOTSUPP) is quite unique and easy to find the source :( Thanks, Qu A quite code search leads me to inline extent. So, if you still have the original ext* image, would you please try revert to ext* and then convert it to btrfs again? But this time, please convert with --no-inline option, and see if this remove the problem. Thanks, Qu Jun 11 23:13:08 mars kernel: BTRFS info (device sda1): forced readonly Some diagnostic info: - btrfs scrub reports no errors - on the host machine I'm running btrfs v4.0+20150429 and kernel 4.0.4-3-desktop - on the live medium, used to run btrfs-convert, I was running btrfs v4.0+20150429 and kernel 4.0.3-1-default # btrfs fi show Label: none uuid: 54dea125-74cd-4bb2-86a2-f7bc645b76cf Total devices 1 FS bytes used 90.22GiB devid1 size 223.57GiB used 92.03GiB path /dev/sda1 btrfs-progs v4.0+20150429 # btrfs fi df / Data, single: total=89.00GiB, used=88.17GiB System, single: total=32.00MiB, used=16.00KiB Metadata, single: total=3.00GiB, used=2.05GiB GlobalReserve, single: total=512.00MiB, used=0.00B Is there a way out? I still have the old ext4 image and can revert, but I'm ke
Re: [PATCH v2 0/5] Btrfs: RAID 5/6 missing device scrub+replace
No problem. Wang Yanfeng cheers wangyf 在 2015年06月26日 00:35, Omar Sandoval 写道: On Thu, Jun 25, 2015 at 01:03:57PM +0800, wangyf wrote: I confirmed this bug report, and found the reason is that I compiled the patched module with a dirty kernel. This morning I tested this patch again, and didn't see above error, this patch is OK. Sorry for this bug report. : ( It's no problem! Do either of you feel like providing your Tested-by? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs partition converted from ext4 becomes read-only minutes after booting: WARNING: CPU: 2 PID: 2777 at ../fs/btrfs/super.c:260 __btrfs_abort_transaction+0x4b/0x120
Robert Munteanu wrote on 2015/06/12 15:19 +0300: Hi, I have converted my root ext4 partition to btrfs. I used an USB stick to boot and used btrfs-convert. I also did a balance and defrag ( in that order ) , both when the fs was mounted. After logging in to KDE I quickly get a read-only filesystem. I've pasted the backtrace below Jun 11 23:13:08 mars kernel: WARNING: CPU: 2 PID: 2777 at ../fs/btrfs/super.c:260 __btrfs_abort_transaction+0x4b/0x120 [btrfs]() Jun 11 23:13:08 mars kernel: BTRFS: Transaction aborted (error -95) Jun 11 23:13:08 mars kernel: Modules linked in: bnep bluetooth rfkill fuse vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) af_packet nf_log_ipv6 xt_pkttype nf_log_ip v4 nf_log_common xt_LOG xt_limit ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_con ntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables xfs libcrc32c snd_hda _codec_hdmi raid1 md_mod gpio_ich ppdev iTCO_wdt iTCO_vendor_support coretemp snd_hda_codec_realtek snd_hda_codec_generic kvm_intel snd_hda_intel dm_mod kvm snd_hda_co ntroller snd_hda_codec snd_hwdep serio_raw pcspkr snd_pcm i2c_i801 snd_seq joydev snd_seq_device snd_timer snd 8250_fintek parport_pc parport acpi_cpufreq lpc_ich Jun 11 23:13:08 mars kernel: soundcore mfd_core shpchp processor ata_generic btrfs hid_logitech_hidpp xor raid6_pq sr_mod cdrom nvidia_uvm(PO) nvidia(PO) firewire_ohc i firewire_core crc_itu_t uas usb_storage r8169 mii pata_jmicron hid_logitech_dj drm button sg Jun 11 23:13:08 mars kernel: CPU: 2 PID: 2777 Comm: kworker/u8:0 Tainted: P O4.0.4-3-desktop #1 Jun 11 23:13:08 mars kernel: Hardware name: Gigabyte Technology Co., Ltd. EP35-DS4/EP35-DS4, BIOS F6d 01/08/2009 Jun 11 23:13:08 mars kernel: Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] Jun 11 23:13:08 mars kernel: a0a92832 8167c4aa 880128513ca8 Jun 11 23:13:08 mars kernel: 81063bb1 880031929d28 880221e71800 ffa1 Jun 11 23:13:08 mars kernel: a0a914e0 0b50 81063c2a a0a95928 Jun 11 23:13:08 mars kernel: Call Trace: Jun 11 23:13:08 mars kernel: [] dump_trace+0x8c/0x340 Jun 11 23:13:08 mars kernel: [] show_stack_log_lvl+0xa3/0x190 Jun 11 23:13:08 mars kernel: [] show_stack+0x21/0x50 Jun 11 23:13:08 mars kernel: [] dump_stack+0x47/0x67 Jun 11 23:13:08 mars kernel: [] warn_slowpath_common+0x81/0xb0 Jun 11 23:13:08 mars kernel: [] warn_slowpath_fmt+0x4a/0x50 Jun 11 23:13:08 mars kernel: [] __btrfs_abort_transaction+0x4b/0x120 [btrfs] Jun 11 23:13:08 mars kernel: [] btrfs_finish_ordered_io+0x5aa/0x620 [btrfs] Jun 11 23:13:08 mars kernel: [] normal_work_helper+0xc3/0x320 [btrfs] Jun 11 23:13:08 mars kernel: [] process_one_work+0x142/0x420 Jun 11 23:13:08 mars kernel: [] worker_thread+0x114/0x460 Jun 11 23:13:08 mars kernel: [] kthread+0xc1/0xe0 Jun 11 23:13:08 mars kernel: [] ret_from_fork+0x58/0x90 Jun 11 23:13:08 mars kernel: ---[ end trace 4c4eb7d6e98afa91 ]--- Jun 11 23:13:08 mars kernel: BTRFS: error (device sda1) in btrfs_finish_ordered_io:2896: errno=-95 unknown IIRC some one in the mail-list has reported the same bug. Still not sure the root cause but seems highly related to converted fs. It would be much better if you have a clue to trigger the bug. Like read/write which file(s) may cause the bug. If it's OK for you, please upload the btrfs-debug-tree output. WARNING: This output will not contain any data but all your filename/dir name. My first guess is some btrfs codes can't handle the special extent converted from ext* well, but still quite hard to say even the errno(EOPNOTSUPP) is quite unique and easy to find the source :( Thanks, Qu Jun 11 23:13:08 mars kernel: BTRFS info (device sda1): forced readonly Some diagnostic info: - btrfs scrub reports no errors - on the host machine I'm running btrfs v4.0+20150429 and kernel 4.0.4-3-desktop - on the live medium, used to run btrfs-convert, I was running btrfs v4.0+20150429 and kernel 4.0.3-1-default # btrfs fi show Label: none uuid: 54dea125-74cd-4bb2-86a2-f7bc645b76cf Total devices 1 FS bytes used 90.22GiB devid1 size 223.57GiB used 92.03GiB path /dev/sda1 btrfs-progs v4.0+20150429 # btrfs fi df / Data, single: total=89.00GiB, used=88.17GiB System, single: total=32.00MiB, used=16.00KiB Metadata, single: total=3.00GiB, used=2.05GiB GlobalReserve, single: total=512.00MiB, used=0.00B Is there a way out? I still have the old ext4 image and can revert, but I'm keeping the btrfs one for now, in case I can extract some useful debugging information from it. Thanks, Robert -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] btrfs device remove alias
while on this. its also good idea to create alias for btrfs replace start -> btrfs device replace. any comments ? On 06/25/2015 12:09 AM, Omar Sandoval wrote: The opposite of btrfs device add is btrfs device delete. This really should be btrfs device remove. Changes from v1: - Add support for flags to cmd_struct and a CMD_ALIAS flag which only prints the one-line usage string - Rearrange the command wrappers in a way that could be made generic if needed Thanks! Omar Sandoval (2): btrfs-progs: replace struct cmd_group->hidden with flags btrfs-progs: alias btrfs device delete to btrfs device remove Documentation/btrfs-device.asciidoc | 5 - cmds-device.c | 35 ++- cmds-filesystem.c | 2 +- commands.h | 9 +++-- help.c | 12 +++- 5 files changed, 45 insertions(+), 18 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: i_version vs iversion (Was: Re: [RFC PATCH v2 1/2] Btrfs: add noi_version option to disable MS_I_VERSION)
On Thu, Jun 25, 2015 at 02:46:44PM -0400, J. Bruce Fields wrote: > > Does this sound reasonable? > > Just to make sure I understand, the logic is something like: > > to read the i_version: > > inode->i_version_seen = true; > return inode->i_version > > to update the i_version: > > /* >* If nobody's seen this value of i_version then we can >* keep using it, otherwise we need a new one: >*/ > if (inode->i_version_seen) > inode->i_version++; > inode->i_version_seen = false; Yep, that's what I was proposing. > Looks OK to me. As I say I'd expect i_version_seen == true to end up > being the common case in a lot of v4 workloads, so I'm more skeptical of > the claim of a performance improvement in the v4 case. Well, so long as we require i_version to be committed to disk on every single disk write, we're going to be trading off: * client-side performance of the advanced NFSv4 cacheing for reads * server-side performance for writes * data robustness in case of the server crashing and the client-side cache getting out of sync with the server after the crash I don't see any way around that. (So for example, with lazy mtime updates we wouldn't be updating the inode after every single non-allocating write; enabling i_version updates will trash that optimization.) I just want to reduce to a bare minimum the performance hit in the case where NFSv4 exports are not being used (since that is true in a very *large* number of ext4 deployments --- i.e., every single Android handset using ext4 :-), such that we can leave i_version updates turned on by default. > Could maintaining the new flag be a significant drag in itself? If not, > then I guess we're not making things any worse there, so fine. I don't think so; it's a bit in the in-memory inode, so I don't think that should be an issue. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs partition converted from ext4 becomes read-only minutes after booting: WARNING: CPU: 2 PID: 2777 at ../fs/btrfs/super.c:260 __btrfs_abort_transaction+0x4b/0x120
On Thu, Jun 25, 2015 at 08:17:12PM +, Ruslanas Gžibovskis wrote: > nope. started from scratch! never upgrade from old running fs. Better to > move 1 file and extend , move another files, extend and so on then just > convert... It's like moving from windows and only boot partition to have on > ext2 and other system on ntfs... all my friends do the same. And by the way > does anyone still use ext3/4? isn't it dead? Even rhel now goes with > default xfs... Yes, plenty of people use ext4, it's not dead :) It also just added built in encryption in the filesystem, which no other filesystem has AFAIK. Due to how btrfs works with block layouts, it shouldn't be hard to encrypt blocks as they are written just like they can be compressed currently, but no one has sponsored that work yet. Because Google uses ext4 and they (we) care about encrypting data to protect user data from things like possible hardware theft or maybe even a datacenter being raided in some country, that's how ext4 got encryption built in. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: i_version vs iversion (Was: Re: [RFC PATCH v2 1/2] Btrfs: add noi_version option to disable MS_I_VERSION)
On Tue, Jun 23, 2015 at 12:32:41PM -0400, Theodore Ts'o wrote: > On Thu, Jun 18, 2015 at 04:38:56PM +0200, David Sterba wrote: > > Moving the discussion to fsdevel. > > > > Summary: disabling MS_I_VERSION brings some speedups to btrfs, but the > > generic 'noiversion' option cannot be used to achieve that. It is > > processed before it reaches btrfs superblock callback, where > > MS_I_VERSION is forced. > > > > The proposed fix is to add btrfs-specific i_version/noi_version to btrfs, > > to which I object. > > I was talking to Mingming about this on today's ext4 conference call, > and one of the reasons why ext4 turns off i_version update by default > is because it does a real number on our performance as well --- and > furthermore, the only real user of the field from what we can tell is > NFSv4, which not all that many ext4 users actually care about. > > This has caused pain for the nfsv4 folks since it means that they need > to tell people to use a special mount option for ext4 if they are > actually using this for nfsv4, and I suspect they won't be all that > eager to hear that btrfs is going to go the same way. Yes, thanks for looking into this! > This however got us thinking --- even in if NFSv4 is depending on > i_version, it doesn't actually _look_ at that field all that often. Most clients will query it on every write. (I just took a quick look at the code and I believe the Linux client's requesting it immediately after every write, except in the O_DIRECT and delegated cases.) > It's only going to look at it in a response to a client's getattr > call, and that in turn is used to so the client can do its local disk > cache invalidation if anby of the data blocks of the inode has changed. > > So what if we have a per-inode flag which "don't update I_VERSION", > which is off by default, but after the i_version has been updated at > least once, is set, so the i_version field won't be updated again --- > at least until something has actually looked at the i_version field, > when the "don't update I_VERSOIN" flag will get cleared again. > > So basically, if we know there are no microphones in the forest, we > don't need to make the tree fall. However, if someone has sampled the > i_version field, then the next time the inode gets updated, we will > update the i_version field so the NFSv4 client can hear the sound of > the tree crashing to the forst floor and so it can invalidate its > local cache of the file. :-) > > This should significantly improve the performance of using the > i_version field if the file system is being exported via NFSv4, and if > NFSv4 is not in use, no one will be looking at the i_version field, so > the performance impact will be very slight, and thus we could enable > i_version updates by default for btrfs and ext4. > > And this should make the distribution folks happy, since it will unify > the behavior of all file systems, and make life easier for users who > won't need to set certain magic mount options depending on what file > system they are using and whether they are using NFSv4 or not. > > Does this sound reasonable? Just to make sure I understand, the logic is something like: to read the i_version: inode->i_version_seen = true; return inode->i_version to update the i_version: /* * If nobody's seen this value of i_version then we can * keep using it, otherwise we need a new one: */ if (inode->i_version_seen) inode->i_version++; inode->i_version_seen = false; Looks OK to me. As I say I'd expect i_version_seen == true to end up being the common case in a lot of v4 workloads, so I'm more skeptical of the claim of a performance improvement in the v4 case. Could maintaining the new flag be a significant drag in itself? If not, then I guess we're not making things any worse there, so fine. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] btrfs: add no_mtime flag to btrfs-extent-same
On Thu, Jun 25, 2015 at 02:52:50PM +0200, David Sterba wrote: > On Wed, Jun 24, 2015 at 04:17:32PM -0400, Zygo Blaxell wrote: > > Is there any sane use case where we would _want_ EXTENT_SAME to change > > the mtime? We do a lot of work to make sure that none of the files > > involved have any sort of content change. Why do we need the flag at all? > > Good point, I don't see the usecase for updating MTIME. Yeah there isn't one and I doubt anyone will be upset if we just always ignore the mtime update. I'll send some new patches shortly. Thanks for the suggestion Zygo. --Mark -- Mark Fasheh -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: Improve FL_KEEP_SIZE handling in fallocate.
On Mon, Apr 20, 2015 at 1:49 PM, Davide Italiano wrote: > On Mon, Apr 6, 2015 at 10:09 PM, Davide Italiano > wrote: >> - We call inode_size_ok() only if FL_KEEP_SIZE isn't specified. >> - As an optimisation we can skip the call if (off + len) >> isn't greater than the current size of the file. This operation >> is called under the lock so the less work we do, the better. >> - If we call inode_size_ok() pass to it the correct value rather >> than a more conservative estimation. >> >> Signed-off-by: Davide Italiano >> --- >> fs/btrfs/file.c | 10 +++--- >> 1 file changed, 7 insertions(+), 3 deletions(-) >> >> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c >> index 30982bb..f649bfc 100644 >> --- a/fs/btrfs/file.c >> +++ b/fs/btrfs/file.c >> @@ -2586,9 +2586,13 @@ static long btrfs_fallocate(struct file *file, int >> mode, >> } >> >> mutex_lock(&inode->i_mutex); >> - ret = inode_newsize_ok(inode, alloc_end); >> - if (ret) >> - goto out; >> + >> + if (!(mode & FALLOC_FL_KEEP_SIZE) && >> + offset + len > inode->i_size) { >> + ret = inode_newsize_ok(inode, offset + len); >> + if (ret) >> + goto out; >> + } >> >> if (alloc_start > inode->i_size) { >> ret = btrfs_cont_expand(inode, i_size_read(inode), >> -- >> 2.3.4 >> > > Any comment on this? Very gentle ping after couple of months. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: Fix defrag threshold overflow
On Wed, Jun 24, 2015 at 04:21:06PM +0200, Patrik Lundquist wrote: > btrfs fi defrag -t 1T overflows the u32 thresh variable and default, instead > of max, threshold is used. > > Signed-off-by: Patrik Lundquist Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: man mkfs.btrfs: document -O^
On Wed, Jun 24, 2015 at 12:46:13AM +0200, Adam Borowski wrote: > Signed-off-by: Adam Borowski Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NULL pointer dereference during snapshot removal
On Tue, Jun 23, 2015 at 11:10:37AM +0800, Liu Bo wrote: > On Sat, Jun 20, 2015 at 04:53:24PM +0200, Christoph Biedl wrote: > > Hi there, > > > > I'm having trouble with btrfs where removing a snapshot causes a > > kernel Oops at blk_get_backing_dev_info+0x10/0x1c (plus or minus a > > byte bytes). Is this a known issue? Else I'll dig further. Stack > > traces below. > > Can you use gdb to locate the line of blk_get_backing_dev_info+0x10/0x1c? The helper is trivial: 89 struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev) 90 { 91 struct request_queue *q = bdev_get_queue(bdev); 92 93 return &q->backing_dev_info; 94 } There are 2 dereferences: Dump of assembler code for function blk_get_backing_dev_info: 0xc12aa3c0 <+0>: push %ebp 0xc12aa3c1 <+1>: mov%esp,%ebp 0xc12aa3c3 <+3>: call 0xc15cbd90 first deref is ok 0xc12aa3c8 <+8>: mov0x5c(%eax),%eax 0xc12aa3cb <+11>:pop%ebp 0xc12aa3cc <+12>:mov0x210(%eax),%eax this one crashes 0xc12aa3d2 <+18>:add$0xe8,%eax 0xc12aa3d7 <+23>:ret 863 static inline struct request_queue *bdev_get_queue(struct block_device *bdev) 864 { 865 return bdev->bd_disk->queue;/* this is never NULL */ 866 } so bdev or bdev->bd_disk might be NULL, but according to the offsets it seems to be 'bdev->bd_disk'. Strangely, pahole (the structure dumper) does not work here on the 32bit vmlinux so I can't check excactly, but in the 64bit build the offset of bd_disk is 152, if we subtract padding and 4B per pointer, this looks plausible. Anyawy, this is below btrfs layer. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] btrfs: add no_mtime flag to btrfs-extent-same
On Thu, Jun 25, 2015 at 09:10:31AM -0400, Austin S Hemmelgarn wrote: > On 2015-06-25 08:52, David Sterba wrote: > >On Wed, Jun 24, 2015 at 04:17:32PM -0400, Zygo Blaxell wrote: > >>Is there any sane use case where we would _want_ EXTENT_SAME to change > >>the mtime? We do a lot of work to make sure that none of the files > >>involved have any sort of content change. Why do we need the flag at all? > > > >Good point, I don't see the usecase for updating MTIME. > Was the original intent possibly to make certain the CTIME got > updated? Because EXTENT_SAME _does_ update the metadata, and by > logical extension that means that the CTIME should change. It updates the _extent_ metadata. CTIME does not cover that, only _inode_ metadata changes. Put another way, if we updated CTIME for extent metadata changes, then a balance would imply rewriting every inode on the filesystem, which is insane. Same for defragment: when we defragment files, we already leave the MTIME and CTIME fields alone, and we use locking to protect defrag from race conditions that might cause it to modify data. You can confirm the behavior of balance and defrag on v4.0.5: # Here's a file: root@testhost:~# fiemap /media/usb7/vmlinuz File: /media/usb7/vmlinuz Log 0x0..0x654000 Phy 0xc1..0x1264000 Flags FIEMAP_EXTENT_LAST root@testhost:~# ls --full -lc /media/usb7/vmlinuz -rw--- 1 root root 6634160 2015-06-25 12:37:16.265688748 -0400 /media/usb7/vmlinuz # Balance: root@testhost:~# btrfs balance start /media/usb7 Done, had to relocate 5 out of 5 chunks # Confirm the file is an a new physical location: root@testhost:~# fiemap /media/usb7/vmlinuz File: /media/usb7/vmlinuz Log 0x0..0x654000 Phy 0x2b3d..0x2ba24000 Flags FIEMAP_EXTENT_LAST # We did not change the CTIME. root@testhost:~# ls --full -lc /media/usb7/vmlinuz -rw--- 1 root root 6634160 2015-06-25 12:37:16.265688748 -0400 /media/usb7/vmlinuz # Now let's try defrag! root@testhost:~# btrfs fi defrag -c /media/usb7/vmlinuz # File has been moved again, and parts of it are even compressed now root@testhost:~# fiemap /media/usb7/vmlinuz File: /media/usb7/vmlinuz Log 0x0..0x2 Phy 0x2b39..0x2b3b Flags FIEMAP_EXTENT_ENCODED Log 0x2..0x8 Phy 0x2ba64000..0x2bac4000 Flags 0 Log 0x8..0x10 Phy 0x2bac4000..0x2bb44000 Flags 0 Log 0x10..0x18 Phy 0x2bb44000..0x2bbc4000 Flags 0 Log 0x18..0x20 Phy 0x2bbc4000..0x2bc44000 Flags 0 Log 0x20..0x28 Phy 0x2bc44000..0x2bcc4000 Flags 0 Log 0x28..0x30 Phy 0x2bcc4000..0x2bd44000 Flags 0 Log 0x30..0x38 Phy 0x2bd44000..0x2bdc4000 Flags 0 Log 0x38..0x40 Phy 0x2bdc4000..0x2be44000 Flags 0 Log 0x40..0x48 Phy 0x2be44000..0x2bec4000 Flags 0 Log 0x48..0x50 Phy 0x2bec4000..0x2bf44000 Flags 0 Log 0x50..0x58 Phy 0x2bf44000..0x2bfc4000 Flags 0 Log 0x58..0x60 Phy 0x2bfc4000..0x2c044000 Flags 0 Log 0x60..0x654000 Phy 0x2c044000..0x2c098000 Flags FIEMAP_EXTENT_LAST # CTIME not changed (no changes to file content). root@testhost:~# ls --full -lc /media/usb7/vmlinuz -rw--- 1 root root 6634160 2015-06-25 12:37:16.265688748 -0400 /media/usb7/vmlinuz > > signature.asc Description: Digital signature
Re: [PATCH] check: check so offset is not bigger then the leaf
On Thu, Jun 25, 2015 at 09:24:10AM -0700, Josef Bacik wrote: > > + > > + for (i = 0; i < nritems; i++) { > > + void *tmp; > > + > > + tmp = btrfs_item_ptr(buf, i, void); > > + if ((long)tmp >= BTRFS_LEAF_DATA_SIZE(root)) { > > + ret = BTRFS_TREE_BLOCK_INVALID_OFFSETS; > > + fprintf(stderr, "bad item pointer %lu\n", > > + (long)tmp); > > + goto fail; > > + } > > + } > > I'd just do > > if (btrfs_item_end_nr(buf, i) >= BTRFS_LEAF_DATA_SIZE(root)) > > that way you catch problems with offset and size. Thanks, Ah right, my check would not catch 'offset + size >= leaf data size' if 'offset < leaf data size'. Patch welcome. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs filesystem show confused when label is same as mountpoint
On Sat, Jun 13, 2015 at 09:51:41AM +, Duncan wrote: > Sjoerd posted on Sat, 13 Jun 2015 09:20:12 +0200 as excerpted: > > > versus for label: > > btrfs fi show MULTIMEDIA > > Label: 'MULTIMEDIA' uuid: ce5d23cd-73a4-4f7c-83cd-2c40d12f6697 > > Hmm... I wasn't even aware that you could /use/ label! But sure enough, > it works here, too: > > btrfs fi show rt0238gcnx+35l0 > Label: 'rt0238gcnx+35l0' uuid: 8f8d79ef-a86f-4306-a255-e0519e0f6132 > Total devices 2 FS bytes used 1.94GiB > devid1 size 8.00GiB used 3.78GiB path /dev/sda5 > devid2 size 8.00GiB used 3.78GiB path /dev/sdb5 > > btrfs-progs v4.0.1 > > > It works for UUID as well... > > btrfs fi show 8f8d79ef-a86f-4306-a255-e0519e0f6132 > Label: 'rt0238gcnx+35l0' uuid: 8f8d79ef-a86f-4306-a255-e0519e0f6132 > Total devices 2 FS bytes used 1.94GiB > devid1 size 8.00GiB used 3.78GiB path /dev/sda5 > devid2 size 8.00GiB used 3.78GiB path /dev/sdb5 > > btrfs-progs v4.0.1 > > ... but that's a lot of arbitrary typing. > > Doesn't work with partlabel or id (see /dev/disk/by-*), however. =:^( The commandline tries to guess if it's label/uuid/path. If we want to add support for partlabel and/or partuuid, we can't use the bare string, but possibly the blkid tags, like $ btrfs fi show PARTUUID="8f8d79ef-a86f-4306-a255-e0519e0f6132" -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/5] Btrfs: RAID 5/6 missing device scrub+replace
On Thu, Jun 25, 2015 at 01:03:57PM +0800, wangyf wrote: > I confirmed this bug report, and found the reason is that > I compiled the patched module with a dirty kernel. > This morning I tested this patch again, and didn't see above error, this > patch is OK. > Sorry for this bug report. : ( It's no problem! Do either of you feel like providing your Tested-by? -- Omar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] check: check so offset is not bigger then the leaf
On 06/25/2015 09:06 AM, David Sterba wrote: On Thu, Jun 18, 2015 at 10:16:54AM -0700, Josef Bacik wrote: On 06/18/2015 09:44 AM, David Sterba wrote: On Thu, Jun 18, 2015 at 01:59:13AM +0200, Robert Marklund wrote: This could crash before because of dangerous dangling offset of pointer. That's right, this can happen. There are more btrfs_item_ptr that would be good to validate that way, namely in the checker as it's most likely to see corrupted data. The check_block stuff should be doing this, if it isn't that's where we need to fix it. Thanks, Something like that? --- a/ctree.c +++ b/ctree.c @@ -521,6 +521,19 @@ btrfs_check_leaf(struct btrfs_root *root, struct btrfs_disk_key *parent_key, goto fail; } } + + for (i = 0; i < nritems; i++) { + void *tmp; + + tmp = btrfs_item_ptr(buf, i, void); + if ((long)tmp >= BTRFS_LEAF_DATA_SIZE(root)) { + ret = BTRFS_TREE_BLOCK_INVALID_OFFSETS; + fprintf(stderr, "bad item pointer %lu\n", + (long)tmp); + goto fail; + } + } I'd just do if (btrfs_item_end_nr(buf, i) >= BTRFS_LEAF_DATA_SIZE(root)) that way you catch problems with offset and size. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 2/2] Btrfs: improve fsync for nocow file
On Thu, Jun 25, 2015 at 10:24:25AM +0800, Liu Bo wrote: > > I'm not sure I understand, you mean split the NOISIZE into two bits and > > use NOISIZE just for inode size change and the other one for the > > cow_file_range case? > > Yes, for now it has mixed meanings, either changing i_size or doing Cow. > But I think it'd better to leave it mixed if we document it well. Agreed, please also comment the new fsync code that actually uses the nocow + notimestamp + noisize bits. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] check: check so offset is not bigger then the leaf
On Thu, Jun 18, 2015 at 10:16:54AM -0700, Josef Bacik wrote: > On 06/18/2015 09:44 AM, David Sterba wrote: > > On Thu, Jun 18, 2015 at 01:59:13AM +0200, Robert Marklund wrote: > >> This could crash before because of dangerous dangling > >> offset of pointer. > > > > That's right, this can happen. There are more btrfs_item_ptr that would > > be good to validate that way, namely in the checker as it's most likely > > to see corrupted data. > > > > The check_block stuff should be doing this, if it isn't that's where we > need to fix it. Thanks, Something like that? --- a/ctree.c +++ b/ctree.c @@ -521,6 +521,19 @@ btrfs_check_leaf(struct btrfs_root *root, struct btrfs_disk_key *parent_key, goto fail; } } + + for (i = 0; i < nritems; i++) { + void *tmp; + + tmp = btrfs_item_ptr(buf, i, void); + if ((long)tmp >= BTRFS_LEAF_DATA_SIZE(root)) { + ret = BTRFS_TREE_BLOCK_INVALID_OFFSETS; + fprintf(stderr, "bad item pointer %lu\n", + (long)tmp); + goto fail; + } + } + return BTRFS_TREE_BLOCK_CLEAN; fail: if (btrfs_header_owner(buf) == BTRFS_EXTENT_TREE_OBJECTID) { --- Compile-tested only. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs stable updates for 4.0
On Thu, Jun 25, 2015 at 05:10:29PM +0200, David Sterba wrote: > On Fri, Jun 19, 2015 at 01:31:31PM -0700, Greg KH wrote: > > > 5cc2b17e80cf5770f2e585c2d90fd8af1b901258 # 3.14+ > > > > Does not build on 3.14+, sorry. Please provide a backported version if > > you want to see it there. > > I'm sorry about the hassle with applying to older version. My main > goal is to provide a set of patches for the latest stable series, and > they get reviewed and tested properly. I try to look whether the patches > are relvant for older versions but this takes extra time. I don't have > particular interest in these so it's only best effort, but so far it > hasn't met the 'best' premise. I'll try better next time. That's fine, no need to work hard for any longterm kernel if you don't want to, just letting you know that this patch didn't work there. I don't care if it doesn't make it if you don't :) thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] btrfs: csum: Introduce partial csum for tree block.
On Fri, Jun 19, 2015 at 09:26:11AM +0800, Qu Wenruo wrote: > > I agree with that. I'm still not convinced that adding all the kernel > > code to repair the data is justified, compared to the block-level > > redundancy alternatives. > > Totally agree with this. > That's why we have support for RAID1/5/6/10. > > I also hate to add complexity to kernel codes, especially when the scrub > codes are already quite complex. > > But in fact, my teammate Zhao Lei is already doing some work to make > scrub codes clean and neat. Doing cleanups is a good thing regardless of new features, please don't hesitate to post them even if we do not agree to implement the partial csum/repair. I'm not against adding the partial csums & repair, but at the moment I'm not convinced. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs stable updates for 4.0
On Fri, Jun 19, 2015 at 01:31:31PM -0700, Greg KH wrote: > > 5cc2b17e80cf5770f2e585c2d90fd8af1b901258 # 3.14+ > > Does not build on 3.14+, sorry. Please provide a backported version if > you want to see it there. I'm sorry about the hassle with applying to older version. My main goal is to provide a set of patches for the latest stable series, and they get reviewed and tested properly. I try to look whether the patches are relvant for older versions but this takes extra time. I don't have particular interest in these so it's only best effort, but so far it hasn't met the 'best' premise. I'll try better next time. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS balance fails with -dusage=100
On 06/24/2015 12:33 PM, Moby wrote: On 06/22/2015 10:53 PM, Moby wrote: OpenSuSE 13.2 system with single BTRFS / mounted on top of /dev/md1. /dev/md1 is md raid5 across 4 SATA disks. System details are: Linux suse132 4.0.5-4.g56152db-default #1 SMP Thu Jun 18 15:11:06 UTC 2015 (56152db) x86_64 x86_64 x86_64 GNU/Linux btrfs-progs v4.1+20150622 Label: none uuid: 33b98d97-606b-4968-a266-24a48a9fe50d Total devices 1 FS bytes used 884.21GiB devid1 size 1.36TiB used 889.06GiB path /dev/md1 Data, single: total=885.00GiB, used=883.12GiB System, DUP: total=32.00MiB, used=144.00KiB Metadata, DUP: total=2.00GiB, used=1.09GiB GlobalReserve, single: total=384.00MiB, used=0.00B Relevant entries from log are: 2015-06-22T22:46:32.238011-05:00 suse132 kernel: [90193.446128] BTRFS: bdev /dev/md1 errs: wr 9977, rd 0, flush 0, corrupt 0, gen 0 2015-06-22T22:46:32.238050-05:00 suse132 kernel: [90193.446158] BTRFS: bdev /dev/md1 errs: wr 9978, rd 0, flush 0, corrupt 0, gen 0 2015-06-22T22:46:32.238054-05:00 suse132 kernel: [90193.446179] BTRFS: bdev /dev/md1 errs: wr 9979, rd 0, flush 0, corrupt 0, gen 0 System was (still is - other than btrfs balance) running fine. Then I did massive data I/O, copying and deleting and massive amounts of data to bring the system into it's present state. Once I was done with the I/O, kicked off btrfs balance start /. Above command failed. Then I started doing btrfs balance -dusage=XX / This command succeeds with XX upto and including 99. It fails when I set XX to 100. btrfs balance also fails if I omit the -dusage option. The errors in the log make no sense to me since the md raid device is not reporting any errors at all. Also running btrfs scrub reports no errors at all. Any ideas on how to get btrfs balance to succeed without errors would be welcome. Regards, --Moby -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in On another run with -duage=95, I am now seeing the following (negative percentage left value!) Every 15.0s: sh -c date;btrfs balance status -v / Wed Jun 24 12:29:12 2015 Wed Jun 24 12:29:12 CDT 2015 Balance on '/' is running 306 out of about 145 chunks balanced (312 considered), -111% left Dumping filters: flags 0x1, state 0x1, force is off DATA (flags 0x2): balancing, usage=95 --Moby -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Upgrading to kernel 4.1.0-1.gfcf8349-default and btrfs-progs v4.1+20150622 seems to have fixed the problem. btrfs balance now completes without any errors. -- --Moby They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -- Benjamin Franklin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix fsync after truncate when no_holes feature is enabled
On Thu, Jun 25, 2015 at 3:20 PM, Liu Bo wrote: > On Thu, Jun 25, 2015 at 04:17:46AM +0100, fdman...@kernel.org wrote: >> From: Filipe Manana >> >> When we have the no_holes feature enabled, if a we truncate a file to a >> smaller size, truncate it again but to a size greater than or equals to >> its original size and fsync it, the log tree will not have any information >> about the hole covering the range [truncate_1_offset, new_file_size[. >> Which means if the fsync log is replayed, the file will remain with the >> state it had before both truncate operations. > > Does the fs/subvol tree get updated to the right information at this > time? No, and that's the problem. Because no file extent items are stored in the log tree. The inode item is updated with the new i_size however (as expected). thanks > > Thanks, > > -liubo >> >> Without the no_holes feature this does not happen, since when the inode >> is logged (full sync flag is set) it will find in the fs/subvol tree a >> leaf with a generation matching the current transaction id that has an >> explicit extent item representing the hole. >> >> Fix this by adding an explicit extent item representing a hole between >> the last extent and the inode's i_size if we are doing a full sync. >> >> The issue is easy to reproduce with the following test case for fstests: >> >> . ./common/rc >> . ./common/filter >> . ./common/dmflakey >> >> _need_to_be_root >> _supported_fs generic >> _supported_os Linux >> _require_scratch >> _require_dm_flakey >> >> # This test was motivated by an issue found in btrfs when the btrfs >> # no-holes feature is enabled (introduced in kernel 3.14). So enable >> # the feature if the fs being tested is btrfs. >> if [ $FSTYP == "btrfs" ]; then >> _require_btrfs_fs_feature "no_holes" >> _require_btrfs_mkfs_feature "no-holes" >> MKFS_OPTIONS="$MKFS_OPTIONS -O no-holes" >> fi >> >> rm -f $seqres.full >> >> _scratch_mkfs >>$seqres.full 2>&1 >> _init_flakey >> _mount_flakey >> >> # Create our test files and make sure everything is durably persisted. >> $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 64K" \ >> -c "pwrite -S 0xbb 64K 61K" \ >> $SCRATCH_MNT/foo | _filter_xfs_io >> $XFS_IO_PROG -f -c "pwrite -S 0xee 0 64K" \ >> -c "pwrite -S 0xff 64K 61K" \ >> $SCRATCH_MNT/bar | _filter_xfs_io >> sync >> >> # Now truncate our file foo to a smaller size (64Kb) and then truncate >> # it to the size it had before the shrinking truncate (125Kb). Then >> # fsync our file. If a power failure happens after the fsync, we expect >> # our file to have a size of 125Kb, with the first 64Kb of data having >> # the value 0xaa and the second 61Kb of data having the value 0x00. >> $XFS_IO_PROG -c "truncate 64K" \ >>-c "truncate 125K" \ >>-c "fsync" \ >>$SCRATCH_MNT/foo >> >> # Do something similar to our file bar, but the first truncation sets >> # the file size to 0 and the second truncation expands the size to the >> # double of what it was initially. >> $XFS_IO_PROG -c "truncate 0" \ >>-c "truncate 253K" \ >>-c "fsync" \ >>$SCRATCH_MNT/bar >> >> _load_flakey_table $FLAKEY_DROP_WRITES >> _unmount_flakey >> >> # Allow writes again, mount to trigger log replay and validate file >> # contents. >> _load_flakey_table $FLAKEY_ALLOW_WRITES >> _mount_flakey >> >> # We expect foo to have a size of 125Kb, the first 64Kb of data all >> # having the value 0xaa and the remaining 61Kb to be a hole (all bytes >> # with value 0x00). >> echo "File foo content after log replay:" >> od -t x1 $SCRATCH_MNT/foo >> >> # We expect bar to have a size of 253Kb and no extents (any byte read >> # from bar has the value 0x00). >> echo "File bar content after log replay:" >> od -t x1 $SCRATCH_MNT/bar >> >> status=0 >> exit >> >> The expected file contents in the golden output are: >> >> File foo content after log replay: >> 000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa >> * >> 020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> * >> 0372000 >> File bar content after log replay: >> 000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> * >> 0772000 >> >> Without this fix, their contents are: >> >> File foo content after log replay: >> 000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa >> * >> 020 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb >> * >> 0372000 >> File bar content after log replay: >> 000 ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee >> * >> 020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> * >> 0372000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> * >> 0772000 >> >> A test case submission for fstests follows soon. >> >> Signed-off-by: Filipe Manana >> --- >> fs/btrfs/tree-log.c |
Re: [PATCH] Btrfs: fix fsync after truncate when no_holes feature is enabled
On Thu, Jun 25, 2015 at 04:17:46AM +0100, fdman...@kernel.org wrote: > From: Filipe Manana > > When we have the no_holes feature enabled, if a we truncate a file to a > smaller size, truncate it again but to a size greater than or equals to > its original size and fsync it, the log tree will not have any information > about the hole covering the range [truncate_1_offset, new_file_size[. > Which means if the fsync log is replayed, the file will remain with the > state it had before both truncate operations. Does the fs/subvol tree get updated to the right information at this time? Thanks, -liubo > > Without the no_holes feature this does not happen, since when the inode > is logged (full sync flag is set) it will find in the fs/subvol tree a > leaf with a generation matching the current transaction id that has an > explicit extent item representing the hole. > > Fix this by adding an explicit extent item representing a hole between > the last extent and the inode's i_size if we are doing a full sync. > > The issue is easy to reproduce with the following test case for fstests: > > . ./common/rc > . ./common/filter > . ./common/dmflakey > > _need_to_be_root > _supported_fs generic > _supported_os Linux > _require_scratch > _require_dm_flakey > > # This test was motivated by an issue found in btrfs when the btrfs > # no-holes feature is enabled (introduced in kernel 3.14). So enable > # the feature if the fs being tested is btrfs. > if [ $FSTYP == "btrfs" ]; then > _require_btrfs_fs_feature "no_holes" > _require_btrfs_mkfs_feature "no-holes" > MKFS_OPTIONS="$MKFS_OPTIONS -O no-holes" > fi > > rm -f $seqres.full > > _scratch_mkfs >>$seqres.full 2>&1 > _init_flakey > _mount_flakey > > # Create our test files and make sure everything is durably persisted. > $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 64K" \ > -c "pwrite -S 0xbb 64K 61K" \ > $SCRATCH_MNT/foo | _filter_xfs_io > $XFS_IO_PROG -f -c "pwrite -S 0xee 0 64K" \ > -c "pwrite -S 0xff 64K 61K" \ > $SCRATCH_MNT/bar | _filter_xfs_io > sync > > # Now truncate our file foo to a smaller size (64Kb) and then truncate > # it to the size it had before the shrinking truncate (125Kb). Then > # fsync our file. If a power failure happens after the fsync, we expect > # our file to have a size of 125Kb, with the first 64Kb of data having > # the value 0xaa and the second 61Kb of data having the value 0x00. > $XFS_IO_PROG -c "truncate 64K" \ >-c "truncate 125K" \ >-c "fsync" \ >$SCRATCH_MNT/foo > > # Do something similar to our file bar, but the first truncation sets > # the file size to 0 and the second truncation expands the size to the > # double of what it was initially. > $XFS_IO_PROG -c "truncate 0" \ >-c "truncate 253K" \ >-c "fsync" \ >$SCRATCH_MNT/bar > > _load_flakey_table $FLAKEY_DROP_WRITES > _unmount_flakey > > # Allow writes again, mount to trigger log replay and validate file > # contents. > _load_flakey_table $FLAKEY_ALLOW_WRITES > _mount_flakey > > # We expect foo to have a size of 125Kb, the first 64Kb of data all > # having the value 0xaa and the remaining 61Kb to be a hole (all bytes > # with value 0x00). > echo "File foo content after log replay:" > od -t x1 $SCRATCH_MNT/foo > > # We expect bar to have a size of 253Kb and no extents (any byte read > # from bar has the value 0x00). > echo "File bar content after log replay:" > od -t x1 $SCRATCH_MNT/bar > > status=0 > exit > > The expected file contents in the golden output are: > > File foo content after log replay: > 000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa > * > 020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > * > 0372000 > File bar content after log replay: > 000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > * > 0772000 > > Without this fix, their contents are: > > File foo content after log replay: > 000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa > * > 020 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb > * > 0372000 > File bar content after log replay: > 000 ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee > * > 020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > * > 0372000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > * > 0772000 > > A test case submission for fstests follows soon. > > Signed-off-by: Filipe Manana > --- > fs/btrfs/tree-log.c | 108 > > 1 file changed, 108 insertions(+) > > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c > index 7ac45cf..ac90336 100644 > --- a/fs/btrfs/tree-log.c > +++ b/fs/btrfs/tree-log.c > @@ -4203,6 +4203,107 @@ static int btrfs_log_all_xattrs(struct > btrfs_trans_hand
[PATCH] Btrfs: fix fsync after truncate when no_holes feature is enabled
From: Filipe Manana When we have the no_holes feature enabled, if a we truncate a file to a smaller size, truncate it again but to a size greater than or equals to its original size and fsync it, the log tree will not have any information about the hole covering the range [truncate_1_offset, new_file_size[. Which means if the fsync log is replayed, the file will remain with the state it had before both truncate operations. Without the no_holes feature this does not happen, since when the inode is logged (full sync flag is set) it will find in the fs/subvol tree a leaf with a generation matching the current transaction id that has an explicit extent item representing the hole. Fix this by adding an explicit extent item representing a hole between the last extent and the inode's i_size if we are doing a full sync. The issue is easy to reproduce with the following test case for fstests: . ./common/rc . ./common/filter . ./common/dmflakey _need_to_be_root _supported_fs generic _supported_os Linux _require_scratch _require_dm_flakey # This test was motivated by an issue found in btrfs when the btrfs # no-holes feature is enabled (introduced in kernel 3.14). So enable # the feature if the fs being tested is btrfs. if [ $FSTYP == "btrfs" ]; then _require_btrfs_fs_feature "no_holes" _require_btrfs_mkfs_feature "no-holes" MKFS_OPTIONS="$MKFS_OPTIONS -O no-holes" fi rm -f $seqres.full _scratch_mkfs >>$seqres.full 2>&1 _init_flakey _mount_flakey # Create our test files and make sure everything is durably persisted. $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 64K" \ -c "pwrite -S 0xbb 64K 61K" \ $SCRATCH_MNT/foo | _filter_xfs_io $XFS_IO_PROG -f -c "pwrite -S 0xee 0 64K" \ -c "pwrite -S 0xff 64K 61K" \ $SCRATCH_MNT/bar | _filter_xfs_io sync # Now truncate our file foo to a smaller size (64Kb) and then truncate # it to the size it had before the shrinking truncate (125Kb). Then # fsync our file. If a power failure happens after the fsync, we expect # our file to have a size of 125Kb, with the first 64Kb of data having # the value 0xaa and the second 61Kb of data having the value 0x00. $XFS_IO_PROG -c "truncate 64K" \ -c "truncate 125K" \ -c "fsync" \ $SCRATCH_MNT/foo # Do something similar to our file bar, but the first truncation sets # the file size to 0 and the second truncation expands the size to the # double of what it was initially. $XFS_IO_PROG -c "truncate 0" \ -c "truncate 253K" \ -c "fsync" \ $SCRATCH_MNT/bar _load_flakey_table $FLAKEY_DROP_WRITES _unmount_flakey # Allow writes again, mount to trigger log replay and validate file # contents. _load_flakey_table $FLAKEY_ALLOW_WRITES _mount_flakey # We expect foo to have a size of 125Kb, the first 64Kb of data all # having the value 0xaa and the remaining 61Kb to be a hole (all bytes # with value 0x00). echo "File foo content after log replay:" od -t x1 $SCRATCH_MNT/foo # We expect bar to have a size of 253Kb and no extents (any byte read # from bar has the value 0x00). echo "File bar content after log replay:" od -t x1 $SCRATCH_MNT/bar status=0 exit The expected file contents in the golden output are: File foo content after log replay: 000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa * 020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 0372000 File bar content after log replay: 000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 0772000 Without this fix, their contents are: File foo content after log replay: 000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa * 020 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb * 0372000 File bar content after log replay: 000 ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee * 020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff * 0372000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 0772000 A test case submission for fstests follows soon. Signed-off-by: Filipe Manana --- fs/btrfs/tree-log.c | 108 1 file changed, 108 insertions(+) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 7ac45cf..ac90336 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -4203,6 +4203,107 @@ static int btrfs_log_all_xattrs(struct btrfs_trans_handle *trans, return 0; } +/* + * If the no holes feature is enabled we need to make sure any hole between the + * last extent and the i_size of our inode is explicitly marked in the log. This + * is to make sure that doing something like: + * + * 1) create file with 128Kb of data + * 2) truncate file to 64Kb + * 3) truncate file to 256Kb + * 4) fsync file + * 5) + * 6) mount fs and trigger log
[PATCH] fstests: generic test for fsync after file truncations
From: Filipe Manana Test that if we truncate a file to a smaller size, then truncate it to its original size or a larger size, then fsyncing it and a power failure happens, the file will have the range [first_truncate_size, last_size[ with all bytes having a value of 0x00 if we read it the next time the filesystem is mounted. This test is motivated by a bug found in btrfs, which is fixed by a patch titled: "Btrfs: fix fsync after truncate when no_holes feature is enabled" Tested against ext3/4, xfs, btrfs (with and without the fix, and with the no_holes feature disabled), f2fs, reiserfs and nilfs2. All filesystems pass the test except for unpatched btrfs with the no_holes feature enabled (as expected) and f2fs. Both produce the following file contents that differ from the golden output: File foo content after log replay: 000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa * 020 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb * 0372000 File bar content after log replay: 000 ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee * 020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff * 0372000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 0772000 Signed-off-by: Filipe Manana --- tests/generic/095 | 116 ++ tests/generic/095.out | 19 + tests/generic/group | 1 + 3 files changed, 136 insertions(+) create mode 100755 tests/generic/095 create mode 100644 tests/generic/095.out diff --git a/tests/generic/095 b/tests/generic/095 new file mode 100755 index 000..bfd4112 --- /dev/null +++ b/tests/generic/095 @@ -0,0 +1,116 @@ +#! /bin/bash +# FSQA Test No. 095 +# +# Test that if we truncate a file to a smaller size, then truncate it to its +# original size or a larger size, then fsyncing it and a power failure happens, +# the file will have the range [first_truncate_size, last_size[ with all bytes +# having a value of 0x00 if we read it the next time the filesystem is mounted. +# +# This test is motivated by a bug found in btrfs. +# +#--- +# +# Copyright (C) 2015 SUSE Linux Products GmbH. All Rights Reserved. +# Author: Filipe Manana +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + _cleanup_flakey + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/dmflakey + +# real QA test starts here +_need_to_be_root +_supported_fs generic +_supported_os Linux +_require_scratch +_require_dm_flakey + +# This test was motivated by an issue found in btrfs when the btrfs no-holes +# feature is enabled (introduced in kernel 3.14). So enable the feature if the +# fs being tested is btrfs. +if [ $FSTYP == "btrfs" ]; then + _require_btrfs_fs_feature "no_holes" + _require_btrfs_mkfs_feature "no-holes" + MKFS_OPTIONS="$MKFS_OPTIONS -O no-holes" +fi + +rm -f $seqres.full + +_scratch_mkfs >>$seqres.full 2>&1 +_init_flakey +_mount_flakey + +# Create our test files and make sure everything is durably persisted. +$XFS_IO_PROG -f -c "pwrite -S 0xaa 0 64K" \ + -c "pwrite -S 0xbb 64K 61K" \ + $SCRATCH_MNT/foo | _filter_xfs_io +$XFS_IO_PROG -f -c "pwrite -S 0xee 0 64K" \ + -c "pwrite -S 0xff 64K 61K" \ + $SCRATCH_MNT/bar | _filter_xfs_io +sync + +# Now truncate our file foo to a smaller size (64Kb) and then truncate it to the +# size it had before the shrinking truncate (125Kb). Then fsync our file. If a +# power failure happens after the fsync, we expect our file to have a size of +# 125Kb, with the first 64Kb of data having the value 0xaa and the second 61Kb +# of data having the value 0x00. +$XFS_IO_PROG -c "truncate 64K" \ + -c "truncate 125K" \ + -c "fsync" \ + $SCRATCH_MNT/foo + +# Do something similar to our file bar, but the first truncation sets the file +# size to 0 and the second truncation expands the size to the double of what it +# was initially. +$XFS_IO_PROG -c "truncate 0" \ +
[PATCH] fstests: generic test for truncating a file into the middle of a hole
From: Filipe Manana Test that after truncating a file into the middle of a hole causes the new size of the file to be persisted after a clean unmount of the filesystem (or after the inode is evicted). This is for the case where all the data following the hole is not yet durably persisted, that is, that data is only present in the page cache. This test is motivated by an issue found in btrfs, which got fixed by the patch titled: "Btrfs: fix shrinking truncate when the no_holes feature is enabled" Signed-off-by: Filipe Manana --- tests/generic/094 | 91 +++ tests/generic/094.out | 11 +++ tests/generic/group | 1 + 3 files changed, 103 insertions(+) create mode 100755 tests/generic/094 create mode 100644 tests/generic/094.out diff --git a/tests/generic/094 b/tests/generic/094 new file mode 100755 index 000..0876eb7 --- /dev/null +++ b/tests/generic/094 @@ -0,0 +1,91 @@ +#! /bin/bash +# FSQA Test No. 094 +# +# Test that after truncating a file into the middle of a hole causes the new +# size of the file to be persisted after a clean unmount of the filesystem (or +# after the inode is evicted). This is for the case where all the data following +# the hole is not yet durably persisted, that is, that data is only present in +# the page cache. +# +# This test is motivated by an issue found in btrfs. +# +#--- +# +# Copyright (C) 2015 SUSE Linux Products GmbH. All Rights Reserved. +# Author: Filipe Manana +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_need_to_be_root +_supported_fs generic +_supported_os Linux +_require_scratch + +# This test was motivated by an issue found in btrfs when the btrfs no-holes +# feature is enabled (introduced in kernel 3.14). So enable the feature if the +# fs being tested is btrfs. +if [ $FSTYP == "btrfs" ]; then + _require_btrfs_fs_feature "no_holes" + _require_btrfs_mkfs_feature "no-holes" + MKFS_OPTIONS="$MKFS_OPTIONS -O no-holes" +fi + +rm -f $seqres.full + +_scratch_mkfs >>$seqres.full 2>&1 +_scratch_mount + +# Create our test file with some data and durably persist it. +$XFS_IO_PROG -f -c "pwrite -S 0xaa 0 128K" $SCRATCH_MNT/foo | _filter_xfs_io +sync + +# Append some data to the file, increasing its size, and leave a hole between +# the old size and the start offset if the following write. So our file gets +# a hole in the range [128Kb, 256Kb[. +$XFS_IO_PROG -c "pwrite -S 0xbb 256K 32K" $SCRATCH_MNT/foo | _filter_xfs_io + +# Now truncate our file to a smaller size that is in the middle of the hole we +# previously created. On most truncate implementations the data we appended +# before gets discarded from memory (with truncate_setsize()) and never ends +# up being written to disk. +$XFS_IO_PROG -c "truncate 160K" $SCRATCH_MNT/foo + +_scratch_remount + +# We expect to see a file with a size of 160Kb, with the first 128Kb of data all +# having the value 0xaa and the remaining 32Kb of data all having the value 0x00 +echo "File content after remount:" +od -t x1 $SCRATCH_MNT/foo + +status=0 +exit diff --git a/tests/generic/094.out b/tests/generic/094.out new file mode 100644 index 000..47d1b63 --- /dev/null +++ b/tests/generic/094.out @@ -0,0 +1,11 @@ +QA output created by 094 +wrote 131072/131072 bytes at offset 0 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 32768/32768 bytes at offset 262144 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +File content after remount: +000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa +* +040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 +* +050 diff --git a/tests/generic/group b/tests/generic/group index ae40fed..c14ddb9 100644 --- a/tests/generic/group +++ b/tests/generic/group @@ -96,6 +96,7 @@ 091 rw auto quick 092 auto quick prealloc 093 attr cap udf auto +094 auto quick metadata 097 udf auto 099 udf auto 100 udf auto -- 2.1.3 -- To
[PATCH] Btrfs: fix shrinking truncate when the no_holes feature is enabled
From: Filipe Manana If the no_holes feature is enabled, we attempt to shrink a file to a size that ends up in the middle of a hole and we don't have any file extent items in the fs/subvol tree that go beyond the new file size (or any ordered extents that will insert such file extent items), we end up not updating the inode's disk_i_size, we only update the inode's i_size. This means that after unmounting and mounting the filesystem, or after the inode is evicted and reloaded, its i_size ends up being incorrect (an inode's i_size is set to the disk_i_size field when an inode is loaded). This happens when btrfs_truncate_inode_items() doesn't find any file extent items to drop - in this case it never makes a call to btrfs_ordered_update_i_size() in order to update the inode's disk_i_size. Example reproducer: $ mkfs.btrfs -O no-holes -f /dev/sdd $ mount /dev/sdd /mnt # Create our test file with some data and durably persist it. $ xfs_io -f -c "pwrite -S 0xaa 0 128K" /mnt/foo $ sync # Append some data to the file, increasing its size, and leave a hole # between the old size and the start offset if the following write. So # our file gets a hole in the range [128Kb, 256Kb[. $ xfs_io -c "truncate 160K" /mnt/foo # We expect to see our file with a size of 160Kb, with the first 128Kb # of data all having the value 0xaa and the remaining 32Kb of data all # having the value 0x00. $ od -t x1 /mnt/foo 000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa * 040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 050 # Now cleanly unmount and mount again the filesystem. $ umount /mnt $ mount /dev/sdd /mnt # We expect to get the same result as before, a file with a size of # 160Kb, with the first 128Kb of data all having the value 0xaa and the # remaining 32Kb of data all having the value 0x00. $ od -t x1 /mnt/foo 000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa * 040 In the example above the file size/data do not match what they were before the remount. Fix this by always calling btrfs_ordered_update_i_size() with a size matching the size the file was truncated to if btrfs_truncate_inode_items() is not called for a log tree and no file extent items were dropped. This ensures the same behaviour as when the no_holes feature is not enabled. A test case for fstests follows soon. Signed-off-by: Filipe Manana --- fs/btrfs/inode.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index a21ad34..1225330 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4209,7 +4209,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans, u64 extent_num_bytes = 0; u64 extent_offset = 0; u64 item_end = 0; - u64 last_size = (u64)-1; + u64 last_size = new_size; u32 found_type = (u8)-1; int found_extent; int del_item; @@ -4493,8 +4493,7 @@ out: btrfs_abort_transaction(trans, root, ret); } error: - if (last_size != (u64)-1 && - root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) + if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) btrfs_ordered_update_i_size(inode, last_size, NULL); btrfs_free_path(path); -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] btrfs device remove alias
On Thu, Jun 25, 2015 at 03:41:51PM +0200, David Sterba wrote: > On Wed, Jun 24, 2015 at 09:09:15AM -0700, Omar Sandoval wrote: > > The opposite of btrfs device add is btrfs device delete. This really > > should be btrfs device remove. > > I think people got used to the 'delete' command over time, but for > convenience I don't mind to add the alias. Also you delete files by 'rm' > which is short for 'remove' and probably don't mind either. I do agree that people got used to delete, but its one of those things that new users are likely to trip over. And since we're highly unlikely to ever use 'rm' for something other than deletion, it makes sense to just alias them. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] btrfs device remove alias
On Wed, Jun 24, 2015 at 09:09:15AM -0700, Omar Sandoval wrote: > The opposite of btrfs device add is btrfs device delete. This really > should be btrfs device remove. I think people got used to the 'delete' command over time, but for convenience I don't mind to add the alias. Also you delete files by 'rm' which is short for 'remove' and probably don't mind either. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] btrfs: add no_mtime flag to btrfs-extent-same
On 2015-06-25 08:52, David Sterba wrote: On Wed, Jun 24, 2015 at 04:17:32PM -0400, Zygo Blaxell wrote: Is there any sane use case where we would _want_ EXTENT_SAME to change the mtime? We do a lot of work to make sure that none of the files involved have any sort of content change. Why do we need the flag at all? Good point, I don't see the usecase for updating MTIME. Was the original intent possibly to make certain the CTIME got updated? Because EXTENT_SAME _does_ update the metadata, and by logical extension that means that the CTIME should change. smime.p7s Description: S/MIME Cryptographic Signature
Re: Btrfs progs release 4.1
On Wed, Jun 24, 2015 at 10:26:05PM +0200, Sjoerd wrote: > Thanks for the link...I updated it...The updated version breaks the > btrfsQuota.sh and btrfsQuota.py script from the wiki though :( I see, but the scripts are likely to break anytime due to changes in the qgroup output interfaces. You should rather take them as an example. The preferred way to is to ehnance the tools but this might be harder than maintaining own script. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] btrfs: add no_mtime flag to btrfs-extent-same
On Wed, Jun 24, 2015 at 04:17:32PM -0400, Zygo Blaxell wrote: > Is there any sane use case where we would _want_ EXTENT_SAME to change > the mtime? We do a lot of work to make sure that none of the files > involved have any sort of content change. Why do we need the flag at all? Good point, I don't see the usecase for updating MTIME. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs partition converted from ext4 becomes read-only minutes after booting: WARNING: CPU: 2 PID: 2777 at ../fs/btrfs/super.c:260 __btrfs_abort_transaction+0x4b/0x120
That's unfortunate. Many users, including me, started using btrfs by converting from ext4. I hope this gets fixed. Vytas On Thu, Jun 25, 2015 at 5:16 AM, Marc MERLIN wrote: > On Thu, Jun 18, 2015 at 02:05:04PM +0300, Robert Munteanu wrote: >> On Wed, Jun 17, 2015 at 8:46 PM, Marc MERLIN wrote: >> > On Fri, Jun 12, 2015 at 03:19:06PM +0300, Robert Munteanu wrote: >> >> Hi, >> > >> > Note to others: kernel 4.0.4 >> > >> > Reply to you: >> > I tried ext4 to btrfs once a year ago and it severely mangled my >> > filesystem. >> > I looked at it as a cool feature/hack that may have worked some time ago, >> > but >> > that no one really uses anymore, and that may not work right at this >> > point. >> > >> > Unless you hear back from a developer interested in debugging/fixing >> > this, I would assume that this feature is broken and dead. >> >> I did hear, but in case the general consensus is that this feature is >> broken/experimental/unsafe, it would be great to mention it in the >> wiki page. > > Done > https://btrfs.wiki.kernel.org/index.php/Conversion_from_Ext3 > > Marc > -- > "A mouse is a device used to point at the xterm you want to type in" - A.S.R. > Microsoft is to operating systems > what McDonalds is to gourmet > cooking > Home page: http://marc.merlins.org/ | PGP > 1024R/763BE901 > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html