[PATCH v2] btrfs-progs: skip fs with no seed when build seed/sprout mapping for fi show

2014-11-09 Thread Gui Hecheng
There is no need to try to build seed/sprout mapping for those btrfs
without seed devices, so just skip such fs.
We could get the total number of devices from the disk super block, if it
equals the number of items in list @fs_devices->devices, then there shouldn't
be any seed devices.

Signed-off-by: Gui Hecheng 
---
changlog
v1->v2: adopt more natural function name:
no_seed_devices ==> has_seed_devices
---
 cmds-filesystem.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index e4b2785..c55c486 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -750,6 +750,22 @@ static int find_and_copy_seed(struct btrfs_fs_devices 
*seed,
return 1;
 }
 
+static int has_seed_devices(struct btrfs_fs_devices *fs_devices)
+{
+   struct btrfs_device *device;
+   int dev_cnt_total, dev_cnt = 0;
+
+   device = list_first_entry(&fs_devices->devices, struct btrfs_device,
+ dev_list);
+
+   dev_cnt_total = device->total_devs;
+
+   list_for_each_entry(device, &fs_devices->devices, dev_list)
+   dev_cnt++;
+
+   return dev_cnt_total != dev_cnt;
+}
+
 static int map_seed_devices(struct list_head *all_uuids,
char *search, int *found)
 {
@@ -775,6 +791,10 @@ static int map_seed_devices(struct list_head *all_uuids,
*found = 1;
}
 
+   /* skip fs without seeds */
+   if (!has_seed_devices(cur_fs))
+   continue;
+
/* skip all fs already shown as mounted fs */
if (is_seen_fsid(cur_fs->fsid))
continue;
@@ -956,8 +976,8 @@ devs_only:
}
 
/*
-* scan_for_btrfs() don't build seed/sprout mapping,
-* do mapping build for each scanned fs here
+* The seed/sprout mapping are not detected yet,
+* do mapping build for all umounted fs
 */
ret = map_seed_devices(&all_uuids, search, &found);
if (ret) {
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] btrfs-progs: fix wrong num_devices for btrfs fi show with seed devices

2014-11-09 Thread Gui Hecheng
On Sat, 2014-11-08 at 19:47 +, Mike Fleetwood wrote:
> On 7 November 2014 18:16, David Sterba  wrote:
> > On Fri, Nov 07, 2014 at 10:07:43AM +0800, Gui Hecheng wrote:
> >> The @fi_args->num_devices in @get_fs_info() does not include seed devices.
> >> We could just correct it by searching the chunk tree and count how
> >> many dev_items there are in total which includes seed devices.
> >>
> >> Signed-off-by: Gui Hecheng 
> >> ---
> >> *Note*
> >> This is just a temporary workaround to fix this problem in order to
> >> make users happy, because a new ioctl or sysfs interface to handle this
> >> problem needs more discussions and efforts. After the work implemented
> >> and accepted, we could drop this.
> >
> > Nice, thanks. I agree that this kind of workaround is best possible for
> > the moment, and I'm glad to see that it's not that much code to get the
> > seeding devices right. This would also work with older kernels without
> > the updated sysfs/ioctl interfaces, so this is likely to stay for a long
> > time.
> >
> >> +u64 find_max_id(struct btrfs_ioctl_search_args *search_args, int nr_items)
> >
> > That's a very generic name for a function that does a very specialized
> > thing, but I don't have a suggestion right now.
> 
> Is find_max_device_id a suitable name?

I think this one is really more clear.

> >> +int correct_fs_info(int fd, struct btrfs_ioctl_fs_info_args *fi_args)
> >
> > Same here, make fs_info correct but in what way? A comment would be good
> > as well.
> 
> Sorry, no suggestion for this function name.

may be @search_chunk_tree_for_fs_info?

Thanks for the suggestions.

-Gui

> Thanks,
> Mike


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: fix dead lock while running replace and defrag concurrently

2014-11-09 Thread Gui Hecheng
This can be reproduced by fstests: btrfs/070

The scenario is like the following:

replace worker thread   defrag thread
-   -
copy_nocow_pages_worker btrfs_defrag_file
  copy_nocow_pages_for_inode...
  btrfs_writepages
  |A| lock_extent_bits  extent_write_cache_pages
|B|   lock_page
__extent_writepage
...   writepage_delalloc
find_lock_delalloc_range
|B|   lock_extent_bits
  find_or_create_page
pagecache_get_page
  |A| lock_page

This leads to an ABBA pattern deadlock. To fix it,
o we just change it to an AABB pattern which means to @unlock_extent_bits()
  before we @lock_page(), and in this way the @extent_read_full_page_nolock()
  is no longer in an locked context, so change it back to 
@extent_read_full_page()
  to regain protection.

o Since we @unlock_extent_bits() earlier, then before @write_page_nocow(),
  the extent may not really point at the physical block we want, so we
  have to check it before write.

Signed-off-by: Gui Hecheng 
---
 fs/btrfs/scrub.c | 90 +---
 1 file changed, 60 insertions(+), 30 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index efa0831..4325bb0 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3310,6 +3310,50 @@ out:
scrub_pending_trans_workers_dec(sctx);
 }
 
+static int check_extent_to_block(struct inode *inode, u64 start, u64 len,
+u64 logical)
+{
+   struct extent_state *cached_state = NULL;
+   struct btrfs_ordered_extent *ordered;
+   struct extent_io_tree *io_tree;
+   struct extent_map *em;
+   u64 lockstart = start, lockend = start + len - 1;
+   int ret = 0;
+
+   io_tree = &BTRFS_I(inode)->io_tree;
+
+   lock_extent_bits(io_tree, lockstart, lockend, 0, &cached_state);
+   ordered = btrfs_lookup_ordered_range(inode, lockstart, len);
+   if (ordered) {
+   btrfs_put_ordered_extent(ordered);
+   ret = 1;
+   goto out_unlock;
+   }
+
+   em = btrfs_get_extent(inode, NULL, 0, start, len, 0);
+   if (IS_ERR(em)) {
+   ret = PTR_ERR(em);
+   goto out_unlock;
+   }
+
+   /*
+* This extent does not actually cover the logical extent anymore,
+* move on to the next inode.
+*/
+   if (em->block_start > logical ||
+   em->block_start + em->block_len < logical + len) {
+   free_extent_map(em);
+   ret = 1;
+   goto out_unlock;
+   }
+   free_extent_map(em);
+
+out_unlock:
+   unlock_extent_cached(io_tree, lockstart, lockend, &cached_state,
+GFP_NOFS);
+   return ret;
+}
+
 static int copy_nocow_pages_for_inode(u64 inum, u64 offset, u64 root,
  struct scrub_copy_nocow_ctx *nocow_ctx)
 {
@@ -3318,13 +3362,10 @@ static int copy_nocow_pages_for_inode(u64 inum, u64 
offset, u64 root,
struct inode *inode;
struct page *page;
struct btrfs_root *local_root;
-   struct btrfs_ordered_extent *ordered;
-   struct extent_map *em;
-   struct extent_state *cached_state = NULL;
struct extent_io_tree *io_tree;
u64 physical_for_dev_replace;
+   u64 nocow_ctx_logical;
u64 len = nocow_ctx->len;
-   u64 lockstart = offset, lockend = offset + len - 1;
unsigned long index;
int srcu_index;
int ret = 0;
@@ -3356,30 +3397,13 @@ static int copy_nocow_pages_for_inode(u64 inum, u64 
offset, u64 root,
 
physical_for_dev_replace = nocow_ctx->physical_for_dev_replace;
io_tree = &BTRFS_I(inode)->io_tree;
+   nocow_ctx_logical = nocow_ctx->logical;
 
-   lock_extent_bits(io_tree, lockstart, lockend, 0, &cached_state);
-   ordered = btrfs_lookup_ordered_range(inode, lockstart, len);
-   if (ordered) {
-   btrfs_put_ordered_extent(ordered);
-   goto out_unlock;
-   }
-
-   em = btrfs_get_extent(inode, NULL, 0, lockstart, len, 0);
-   if (IS_ERR(em)) {
-   ret = PTR_ERR(em);
-   goto out_unlock;
-   }
-
-   /*
-* This extent does not actually cover the logical extent anymore,
-* move on to the next inode.
-*/
-   if (em->block_start > nocow_ctx->logical ||
-   em->block_start + em->block_len < nocow_ctx->logical + len) {
-   free_extent_map(em);
-   goto out_unlock;
+   ret = check_extent_to_block(inode, offset, len, nocow_ctx_logical);
+   if (ret) {
+   ret = ret > 0 ? 0 : ret;
+   goto out;
}
-   free_extent_map(em);
 
while (len >=

Re: [PATCH] Btrfs: fix incorrect compression ratio detection

2014-11-09 Thread Miao Xie
On Tue, 7 Oct 2014 18:44:35 -0400, Wang Shilong wrote:
> Steps to reproduce:
>  # mkfs.btrfs -f /dev/sdb
>  # mount -t btrfs /dev/sdb /mnt -o compress=lzo
>  # dd if=/dev/zero of=/mnt/data bs=$((33*4096)) count=1
> 
> after previous steps, inode will be detected as bad compression ratio,
> and NOCOMPRESS flag will be set for that inode.
> 
> Reason is that compress have a max limit pages every time(128K), if a
> 132k write in, it will be splitted into two write(128k+4k), this bug
> is a leftover for commit 68bb462d42a(Btrfs: don't compress for a small write)
> 
> Fix this problem by checking every time before compression, if it is a
> small write(<=blocksize), we bail out and fall into nocompression directly.
> 
> Signed-off-by: Wang Shilong 

Looks good.

Reviewed-by: Miao Xie 

> ---
>  fs/btrfs/inode.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 344a322..b78e90a 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -411,14 +411,6 @@ static noinline int compress_file_range(struct inode 
> *inode,
>   (start > 0 || end + 1 < BTRFS_I(inode)->disk_i_size))
>   btrfs_add_inode_defrag(NULL, inode);
>  
> - /*
> -  * skip compression for a small file range(<=blocksize) that
> -  * isn't an inline extent, since it dosen't save disk space at all.
> -  */
> - if ((end - start + 1) <= blocksize &&
> - (start > 0 || end + 1 < BTRFS_I(inode)->disk_i_size))
> - goto cleanup_and_bail_uncompressed;
> -
>   actual_end = min_t(u64, isize, end + 1);
>  again:
>   will_compress = 0;
> @@ -440,6 +432,14 @@ again:
>  
>   total_compressed = actual_end - start;
>  
> + /*
> +  * skip compression for a small file range(<=blocksize) that
> +  * isn't an inline extent, since it dosen't save disk space at all.
> +  */
> + if (total_compressed <= blocksize &&
> +(start > 0 || end + 1 < BTRFS_I(inode)->disk_i_size))
> + goto cleanup_and_bail_uncompressed;
> +
>   /* we want to make sure that amount of ram required to uncompress
>* an extent is reasonable, so we limit the total size in ram
>* of a compressed extent to 128k.  This is a crucial number
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corruption, bad block, input/output errors - do i run --repair?

2014-11-09 Thread Qu Wenruo


 Original Message 
Subject: corruption, bad block, input/output errors - do i run --repair?
From: Matt McKinnon 
To: 
Date: 2014年11月07日 22:33

Hi All,

I'm running into some corruption and I wanted to seek out advice on 
whether or not to run btrfs check --repair, or if I should fall back 
to my backup file server, or both.


The system is mountable, and usable.

# uname -a
Linux cbmm-fs 3.17.2-custom #1 SMP Thu Oct 30 14:09:57 EDT 2014 x86_64 
x86_64 x86_64 GNU/Linux


# btrfs --version
Btrfs v3.14.2
# btrfs fi show
Label: none  uuid: 30c15060-8fb4-4926-87d4-f7d08c3033c5
Total devices 1 FS bytes used 58.92TiB
devid1 size 76.40TiB used 59.05TiB path /dev/sda1

# btrfs fi df /home
Data, single: total=58.75TiB, used=58.75TiB
System, DUP: total=32.00MiB, used=2.66MiB
System, single: total=4.00MiB, used=3.68MiB
Metadata, DUP: total=119.00GiB, used=116.63GiB
Metadata, single: total=64.01GiB, used=57.68GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


I did run into some RO snapshot corruption which caused me to run 
btrfs check:

This is a known bug in 3.17 with RO snapshot.
It's fixable but not with your old 3.14 btrfs-progs.

Please update to 3.17 btrfs-progs, and re-run btrfsck(without --repair), 
and it will prompt you to

use --repair if this is the exact bug.
Then run with --repair should fix it.

Thanks,
Qu


parent transid verify failed on 20809493159936 wanted 
4486137218058286914 found

390978
parent transid verify failed on 20809493159936 wanted 
4486137218058286914 found

390978
Ignoring transid failure
Checking filesystem on /dev/sda1
UUID: 30c15060-8fb4-4926-87d4-f7d08c3033c5
checking extents
bad block 69290357067776
Errors found in extent allocation tree or chunk allocation
checking free space cache
checking fs roots

...

"dir isize wrong" 1 error
"errors 500, file extent discount, nbytes wrong" 14 errors
"errors 2001, no inode item, link count wrong" 257302 errors

...

found 185063071745 bytes used err is 1
total csum bytes: 8428
total tree bytes: 1889284096
total fs tree bytes: 962678784
total extent tree bytes: 159297536
btree space waste bytes: 340014684
file data blocks allocated: 57344
 referenced 57344
Btrfs v3.14.2

Output of a scrub:

ERROR: scrubbing /home failed for device id 1 (Input/output error)
scrub canceled for 30c15060-8fb4-4926-87d4-f7d08c3033c5
scrub started at Mon Nov  3 06:43:58 2014 and was aborted 
after 7613 seconds

data_extents_scrubbed: 248507555
tree_extents_scrubbed: 10870729
data_bytes_scrubbed: 15375990317056
tree_bytes_scrubbed: 44526505984
read_errors: 0
csum_errors: 0
verify_errors: 0
no_csum: 15712
csum_discards: 988018
super_errors: 0
malloc_errors: 0
uncorrectable_errors: 0
unverified_errors: 0
corrected_errors: 0
last_physical: 15425663205376

Output of a balance:

ERROR: error during balancing '/home' - Input/output error
There may be more info in syslog - try dmesg | tail

[501087.506642] [ cut here ]
[501087.543971] WARNING: CPU: 5 PID: 31885 at 
fs/btrfs/relocation.c:925 build_backref_tree+0x11f0/0x1230 [btrfs]()
[501087.543991] Modules linked in: ipmi_devintf(E) autofs4(E) 
sb_edac(E) edac_core(E) joydev(E) mei_me(E) mei(E) lpc_ich(E) 
ioatdma(E) ipmi_si(E) wmi(E) mac_hid(E) bnep(E) rfcomm(E) bluetooth(E) 
lp(E) parport(E) nfsd(E) nfs_acl(E) auth_rpcgss(E) nfs(E) fscache(E) 
lockd(E) sunrpc(E) ses(E) enclosure(E) hid_generic(E) ahci(E) 
libahci(E) usbhid(E) hid(E) igb(E) dca(E) i2c_algo_bit(E) ptp(E) 
pps_core(E) megaraid_sas(E) btrfs(E) raid6_pq(E) xor(E) libcrc32c(E)
[501087.543995] CPU: 5 PID: 31885 Comm: btrfs Tainted: G D E 
3.17.2-custom #1
[501087.543997] Hardware name: Supermicro 
X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0a 12/27/2013
[501087.543999]  039d 88000eadb808 8176733c 
0282
[501087.544001]   88000eadb848 8107163c 
1000
[501087.544003]  8801d0d9acf0 880497c70380 0001 
0001

[501087.544004] Call Trace:
[501087.544014]  [] dump_stack+0x46/0x58
[501087.544022]  [] warn_slowpath_common+0x8c/0xc0
[501087.544024]  [] warn_slowpath_null+0x1a/0x20
[501087.544039]  [] build_backref_tree+0x11f0/0x1230 
[btrfs]
[501087.544052]  [] relocate_tree_blocks+0x2d1/0x690 
[btrfs]

[501087.544060]  [] ? kmem_cache_alloc_trace+0x39/0x1f0
[501087.544072]  [] relocate_block_group+0x202/0x5f0 
[btrfs]
[501087.544083]  [] 
btrfs_relocate_block_group+0x1b0/0x2d0 [btrfs]
[501087.544098]  [] 
btrfs_relocate_chunk.isra.62+0x75/0x760 [btrfs]
[501087.544111]  [] ? 
release_extent_buffer+0x36/0xe0 [btrfs]
[501087.544124]  [] ? free_extent_buffer+0x61/0xc0 
[btrfs]

[501087.544136]  [] btrfs_balance+0x8ab/0xf50 [btrfs]
[501087.544150]  [] btrfs_ioctl_balance+0x1cc/0x530 
[btrfs]
[501087.544156]  [] ? 
lru_cache_add_active_or_unevictable+0x2b/0xa0

[501087.544168]  

Re: [PATCH] fstests: btrfs, add regression test for clone ioctl

2014-11-09 Thread Dave Chinner
Btrfs folks, review please...

.

On Tue, Oct 21, 2014 at 09:43:11PM +0100, Filipe Manana wrote:
> Regression test for a btrfs clone ioctl issue where races between
> a clone operation and concurrent target file reads would result in
> leaving stale data in the page cache. After the clone operation
> finished, reading from the clone target file would return the old
> and no longer valid data. This affected only buffered reads (i.e.
> didn't affect direct IO reads).
> 
> This issue was fixed by the following linux kernel patch:
> 
> Btrfs: ensure readers see new data after a clone operation
> (commit c125b8bff1d9f6c8c91ce4eb8bd5616058c7d510)
> 
> Signed-off-by: Filipe Manana 
> ---
>  tests/btrfs/081 | 131 
> 
>  tests/btrfs/081.out |   4 ++
>  tests/btrfs/group   |   1 +
>  3 files changed, 136 insertions(+)
>  create mode 100755 tests/btrfs/081
>  create mode 100644 tests/btrfs/081.out
> 
> diff --git a/tests/btrfs/081 b/tests/btrfs/081
> new file mode 100755
> index 000..d2e3767
> --- /dev/null
> +++ b/tests/btrfs/081
> @@ -0,0 +1,131 @@
> +#! /bin/bash
> +# FSQA Test No. 081
> +#
> +# Regression test for a btrfs clone ioctl issue where races between
> +# a clone operation and concurrent target file reads would result in
> +# leaving stale data in the page cache. After the clone operation
> +# finished, reading from the clone target file would return the old
> +# and no longer valid data. This affected only buffered reads (i.e.
> +# didn't affect direct IO reads).
> +#
> +# This issue was fixed by the following linux kernel patch:
> +#
> +# Btrfs: ensure readers see new data after a clone operation
> +# (commit c125b8bff1d9f6c8c91ce4eb8bd5616058c7d510)
> +#
> +#---
> +#
> +# Copyright (C) 2014 SUSE Linux Products GmbH. All Rights Reserved.
> +# Author: Filipe Manana 
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> + rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# real QA test starts here
> +_need_to_be_root
> +_supported_fs btrfs
> +_supported_os Linux
> +_require_scratch
> +_require_btrfs_cloner
> +
> +rm -f $seqres.full
> +
> +num_extents=100
> +extent_size=8192
> +
> +create_source_file()
> +{
> + name=$1
> +
> + # Create a file with $num_extents extents, each with a size of
> + # $extent_size bytes.
> + touch $SCRATCH_MNT/$name
> + for ((i = 0; i < $num_extents; i++)); do
> + off=$((i * $extent_size))
> + run_check $XFS_IO_PROG \
> + -c "pwrite -S $i -b $extent_size $off $extent_size" \
> + -c "fsync" $SCRATCH_MNT/$name

xfs_io dumps errors into the output file and that causes failures.
there is no need to use run_check here.

-Dave.
> + done
> +}
> +
> +create_target_file()
> +{
> + name=$1
> + file_size=$(($num_extents * $extent_size))
> +
> + run_check $XFS_IO_PROG -f -c "pwrite -S 0xff 0 $file_size" \
> + -c "fsync" $SCRATCH_MNT/$name
> +}
> +
> +reader_loop()
> +{
> + name=$1
> +
> + while true; do
> + cat $SCRATCH_MNT/$name > /dev/null
> + done
> +}
> +
> +_scratch_mkfs >>$seqres.full 2>&1
> +_scratch_mount
> +
> +create_source_file "foo"
> +create_target_file "bar"
> +
> +reader_loop "bar" &
> +reader_pid=$!
> +
> +$CLONER_PROG -s 0 -d 0 -l $(($num_extents * $extent_size)) \
> + $SCRATCH_MNT/foo $SCRATCH_MNT/bar
> +
> +kill $reader_pid > /dev/null 2>&1
> +
> +# Now both foo and bar should have exactly the same content.
> +# This didn't use to be the case before the btrfs kernel fix mentioned
> +# above. The clone ioctl was racy, as it removed bar's pages from the
> +# page cache and only after it would update bar's metadata to point to
> +# the same extents that foo's metadata points to - and this was done in
> +# an unprotected way, so that a file read request done right after the
> +# clone ioctl removed the pages from the page cache and 

Re: [PATCH v2] fstests: btrfs, add test for snapshoting after file write + truncate

2014-11-09 Thread Dave Chinner
Ping for Btrfs folks, review please...

On Sun, Oct 26, 2014 at 11:39:03AM +, Filipe Manana wrote:
> Regression test for a btrfs issue where if right after the snapshot
> creation ioctl started, a file write followed by a file truncate
> happened, with both operations increasing the file's size, the created
> snapshot would capture an inconsistent state of the file system tree.
> That state reflected the file truncation but it didn't reflect the
> write operation, and left a gap between two file extent items (and
> that gap corresponded to the total or a partial area of the write
> operation's range).
> 
> This issue was fixed by the following linux kernel patch:
> 
> Btrfs: fix snapshot inconsistency after a file write followed by truncate
> 
> Signed-off-by: Filipe Manana 
> ---
> 
> V2: Added some background processes to cause some cpu load. This makes the
> test fail always on environments with a non-debug kernel and where no
> other significant load (other the test itself) is running.
> 
>  tests/btrfs/080 | 169 
> 
>  tests/btrfs/080.out |   2 +
>  tests/btrfs/group   |   1 +
>  3 files changed, 172 insertions(+)
>  create mode 100755 tests/btrfs/080
>  create mode 100644 tests/btrfs/080.out
> 
> diff --git a/tests/btrfs/080 b/tests/btrfs/080
> new file mode 100755
> index 000..a5d3b38
> --- /dev/null
> +++ b/tests/btrfs/080
> @@ -0,0 +1,169 @@
> +#! /bin/bash
> +# FSQA Test No. 080
> +#
> +# Regression test for a btrfs issue where if right after the snapshot 
> creation
> +# ioctl started, a file write followed by a file truncate happened, with both
> +# operations increasing the file's size, the created snapshot would capture 
> an
> +# inconsistent state of the file system tree. That state reflected the file
> +# truncation but it didn't reflect the write operation, and left a gap 
> between
> +# two file extent items (and that gap corresponded to the total or a partial
> +# area of the write operation's range).
> +#
> +# This issue was fixed by the following linux kernel patch:
> +#
> +# Btrfs: fix snapshot inconsistency after a file write followed by 
> truncate
> +#
> +#---
> +#
> +# Copyright (C) 2014 SUSE Linux Products GmbH. All Rights Reserved.
> +# Author: Filipe Manana 
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> + for p in ${cpu_stress_pids[*]}; do
> + kill $p &> /dev/null
> + done
> + rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# real QA test starts here
> +_need_to_be_root
> +_supported_fs btrfs
> +_supported_os Linux
> +_require_scratch_nocheck
> +
> +rm -f $seqres.full
> +
> +create_snapshot()
> +{
> + local ts=`date +'%H_%M_%S_%N'`
> +
> + _run_btrfs_util_prog subvolume snapshot -r \
> + $SCRATCH_MNT $SCRATCH_MNT/"${ts}_snap"
> +}
> +
> +create_file()
> +{
> + local name=$1
> +
> + run_check $XFS_IO_PROG -f \
> + -c "pwrite -S 0xaa -b 32K 0 32K" \
> + -c "fsync" \
> + -c "pwrite -S 0xbb -b 32770 16K 32770" \
> + -c "truncate 90123" \
> + $SCRATCH_MNT/$name
> +}
> +
> +workout()
> +{
> + local name=$1
> +
> + create_file $name &
> + fpid=$!
> + create_snapshot &
> + spid=$!
> + wait $fpid
> + create_ret=$?
> + wait $spid
> + snap_ret=$?
> + if [ $create_ret != 0 -o $snap_ret != 0 ]; then
> + _fail "Failure creating file or snapshot, check $seqres.full 
> for details"
> + fi
> +}
> +
> +# If the installed btrfs mkfs supports the no-holes feature, make sure the
> +# created fs doesn't get that feature enabled. With it enabled, the below 
> fsck
> +# call wouldn't fail. This feature hasn't been enabled by default since it 
> was
> +# introduced, but be safe and explicitly disable it.
> +_scratch_mkfs -O list-all 2>&1 | grep -q '\bno\-holes\b'
> +if [ $? -eq 0 ]; then
> + mkfs_options="-O ^no-holes"
>

Re: [PATCH] btrfs: regression test for btrfs extent merge

2014-11-09 Thread Dave Chinner
On Thu, Oct 16, 2014 at 04:32:06PM +0800, Qu Wenruo wrote:
>  Original Message 
> Subject: Re: [PATCH] btrfs: regression test for btrfs extent merge
> From: Eryu Guan 
> >>+
> >>+tmp=`mktemp -d`
> >I'm a bit concerned about this. Some functions in common/rc use
> >$tmp.some_sufix as temp file. If $tmp defined as a dir here, I'm not
> >sure if it will cause any trouble.
> It seems that most tests uses /tmp/$$ as tmp, only some exception
> like btrfs/077 uses mktemp -d.
> So I'd better use the old /tmp/$$ method.

Can you please send a patch to fix btrfs/077 as well?

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fstests: add generic test to verify xattr replace operations are atomic

2014-11-09 Thread Dave Chinner
On Mon, Nov 10, 2014 at 12:49:12AM +, Filipe David Manana wrote:
> On Sun, Nov 9, 2014 at 11:45 PM, Dave Chinner  wrote:
> > On Fri, Nov 07, 2014 at 08:40:26PM +, Filipe Manana wrote:
> >> This test verifies that replacing a xattr's value is an atomic
> >> operation. This is motivated by an issue in btrfs where replacing
> >> a xattr's value wasn't an atomic operation, it consisted of
> >> removing the old value and then inserting the new value in a
> >> btree. This made readers (getxattr and listxattrs) not getting
> >> neither the old nor the new value during a short time window.
> >
> > OK, seems like a good thing to test that the application can only
> > see the old or the new value.
> >
> > However, I can't help but wonder about whether the btrfs behaviour
> > is crash safe as it wasn't designed to be atomic from the ground up.
> > i.e. if the system crashes half way through a attribute overwrite,
> > what does btrfs end up with as a result? XFS is guaranteed at a
> > transactional level to return either the old or the new value,
> > depending on where in the operaiton the crash occurred, but I'd just
> > assumed that everyone did attribute replace atomically so it never
> > occurred to me that it might be an issue...
> 
> It's crash safe. Both the delete and insert were done in the same
> transaction, so a crash in between both operations (or after both and
> before the transaction commit) would result in always seeing the old
> value (or better saying, the last persisted value by a transaction
> commit or fsync).

Alright, so no crash issues because all the modifications are in a
single transaction. However, if both modifications are made in the
same transaction, then this bug implies that a user can read a
metadata object in the btree whilst somethign else is concurrently
modifying it, right? i.e. that there is no serialisation between
inode metadata reads and transactional modification operations?

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] fstests: add generic test to verify xattr replace operations are atomic

2014-11-09 Thread Filipe Manana
This test verifies that replacing a xattr's value is an atomic
operation. This is motivated by an issue in btrfs where replacing
a xattr's value wasn't an atomic operation, it consisted of
removing the old value and then inserting the new value in a
btree. This made readers (getxattr and listxattrs) not getting
neither the old nor the new value during a short time window.

Signed-off-by: Filipe Manana 
---

V2: Several changes after Dave's review. Renamed test file, added test to the
metadata category and filtered xattr value out of getfattr to include it
in the golden output.

 tests/generic/036 |  102 +
 tests/generic/036.out | 1001 +
 tests/generic/group   |1 +
 3 files changed, 1104 insertions(+)
 create mode 100755 tests/generic/036
 create mode 100644 tests/generic/036.out

diff --git a/tests/generic/036 b/tests/generic/036
new file mode 100755
index 000..ff5bef7
--- /dev/null
+++ b/tests/generic/036
@@ -0,0 +1,102 @@
+#! /bin/bash
+# FSQA Test No. 036
+#
+# Verify that replacing a xattr's value is an atomic operation.
+# This is motivated by an issue in btrfs where replacing a xattr's value
+# wasn't an atomic operation, it consisted of removing the old value and
+# then inserting the new value in a btree. This made readers (getxattr
+# and listxattrs) not getting neither the old nor the new value during
+# a short time window.
+#
+# The btrfs issue was fixed by the following linux kernel patch:
+#
+#Btrfs: make xattr replace operations atomic
+#
+#---
+#
+# Copyright (C) 2014 SUSE Linux Products GmbH. All Rights Reserved.
+# Author: Filipe Manana 
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   kill $setter_pid
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/attr
+
+# real QA test starts here
+_need_to_be_root
+_supported_fs generic
+_supported_os Linux
+_require_scratch
+_require_attrs
+
+rm -f $seqres.full
+
+xattr_name="user.something"
+xattr_value1="foobar"
+xattr_value2="rabbit_hole"
+
+set_xattr_loop()
+{
+   local name=$1
+   local cur_val=$xattr_value1
+   while true; do
+   $SETFATTR_PROG -n $xattr_name -v $cur_val $SCRATCH_MNT/$name
+   if [ "$cur_val" == "$xattr_value1" ]; then
+   cur_val=$xattr_value2
+   else
+   cur_val=$xattr_value1
+   fi
+   done
+}
+
+value_filter()
+{
+   grep "$xattr_name=" | cut -d '=' -f 2 | \
+   sed -e "s/$xattr_value1/GOOD/" -e "s/$xattr_value2/GOOD/"
+}
+
+_scratch_mkfs >>$seqres.full 2>&1
+_scratch_mount
+
+test_file="test_xattr_replace"
+touch $SCRATCH_MNT/$test_file
+$SETFATTR_PROG -n $xattr_name -v $xattr_value1 $SCRATCH_MNT/$test_file
+
+set_xattr_loop $test_file &
+setter_pid=$!
+
+for ((i = 0; i < 1000; i++)); do
+   $GETFATTR_PROG --absolute-names -n $xattr_name \
+   $SCRATCH_MNT/$test_file | value_filter
+done
+
+status=0
+exit
diff --git a/tests/generic/036.out b/tests/generic/036.out
new file mode 100644
index 000..c440719
--- /dev/null
+++ b/tests/generic/036.out
@@ -0,0 +1,1001 @@
+QA output created by 036
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD"
+"GOOD

Re: [PATCH] fstests: add generic test to verify xattr replace operations are atomic

2014-11-09 Thread Filipe David Manana
On Sun, Nov 9, 2014 at 11:45 PM, Dave Chinner  wrote:
> On Fri, Nov 07, 2014 at 08:40:26PM +, Filipe Manana wrote:
>> This test verifies that replacing a xattr's value is an atomic
>> operation. This is motivated by an issue in btrfs where replacing
>> a xattr's value wasn't an atomic operation, it consisted of
>> removing the old value and then inserting the new value in a
>> btree. This made readers (getxattr and listxattrs) not getting
>> neither the old nor the new value during a short time window.
>
> OK, seems like a good thing to test that the application can only
> see the old or the new value.
>
> However, I can't help but wonder about whether the btrfs behaviour
> is crash safe as it wasn't designed to be atomic from the ground up.
> i.e. if the system crashes half way through a attribute overwrite,
> what does btrfs end up with as a result? XFS is guaranteed at a
> transactional level to return either the old or the new value,
> depending on where in the operaiton the crash occurred, but I'd just
> assumed that everyone did attribute replace atomically so it never
> occurred to me that it might be an issue...

It's crash safe. Both the delete and insert were done in the same
transaction, so a crash in between both operations (or after both and
before the transaction commit) would result in always seeing the old
value (or better saying, the last persisted value by a transaction
commit or fsync).

>
> Perhaps another test involving dm-flakey to trigger errors and
> shutdowns whilst in the middle of replacing attributes might be a
> good thing?
>
> []
>
>> +_cleanup()
>> +{
>> + if [ ! -z $setter_pid ]; then
>> + kill $setter_pid &> /dev/null
>> + fi
>
> No need to do this if[] then. You're already redirecting all errors
> to /dev/null, so if $setter_pid is empty it will just output the
> help string.
>
>> + rm -f $tmp.*
>> +}
>> +
>> +# get standard environment, filters and checks
>> +. ./common/rc
>> +. ./common/filter
>> +. ./common/attr
>> +
>> +# real QA test starts here
>> +_need_to_be_root
>> +_supported_fs generic
>> +_supported_os Linux
>> +_require_scratch
>> +_require_attrs
>> +
>> +rm -f $seqres.full
>> +
>> +xattr_name="user.something"
>> +xattr_value1="foobar"
>> +xattr_value2="rabbit_hole"
>> +
>> +set_xattr_loop()
>> +{
>> + local name=$1
>> +
>> + local cur_val=$xattr_value1
>> + while true; do
>> + $SETFATTR_PROG -n $xattr_name -v $cur_val $SCRATCH_MNT/$name
>> + if [ "$cur_val" == "$xattr_value1" ]; then
>> + cur_val=$xattr_value2
>> + else
>> + cur_val=$xattr_value1
>> + fi
>> + done
>> +}
>> +
>> +_scratch_mkfs >>$seqres.full 2>&1
>> +_scratch_mount
>> +
>> +test_file="test_xattr_replace"
>> +touch $SCRATCH_MNT/$test_file
>> +$SETFATTR_PROG -n $xattr_name -v $xattr_value1 $SCRATCH_MNT/$test_file
>> +
>> +set_xattr_loop $test_file &
>> +setter_pid=$!
>> +
>> +for ((i = 0; i < 1000; i++)); do
>> + xattr_val=$($GETFATTR_PROG --absolute-names -n $xattr_name \
>> + $SCRATCH_MNT/$test_file | grep "$xattr_name=" | cut -d '=' -f 
>> 2)
>> + if [ "$xattr_val" != "\"$xattr_value1\"" -a \
>> + "$xattr_val" != "\"$xattr_value2\"" ]; then
>> + _fail "Missing or unexpected xattr value: $xattr_val"
>> + fi
>
> I'd suggest that a filter is a much better way to do this.
>
> name_filter() {
> grep "$xattr_name=" | cut -d '=' -f 2 | \
> sed -e "s/$xattr_value1/GOOD/" \
> -e "s/$xattr_value2/GOOD/"
> }
>
> for (); do
> $GETFATTR_PROG --absolute-names -n $xattr_name \
> $SCRATCH_MNT/$test_file | name_filter
> done
>
> And so the output file should just be:
>
> GOOD
> GOOD
> 
>
> And if it is bad, then it outputs the bad attribute value instead
> of "GOOD", and the test fails on the golden output match.
>
>> +kill $setter_pid &> /dev/null
>> +unset setter_pid
>
> See above, no need to unset the var.
>> +
>> +echo "Silence is golden"
>> +status=0
>> +exit
>> diff --git a/tests/generic/326.out b/tests/generic/326.out
>> new file mode 100644
>> index 000..4ac0db5
>> --- /dev/null
>> +++ b/tests/generic/326.out
>> @@ -0,0 +1,2 @@
>> +QA output created by 326
>> +Silence is golden
>
> Except when you are using comparisons and "_fail" instead of
> filters. ;)
>
> BTW, new tests should be numbere as the first unused number, not at
> the tail (i.e generic/036). If there are collisions with other new
> tests when I merge them, I'll renumber them at merge time.
>
>>  323 auto aio stress
>>  324 auto fsr quick
>>  325 auto quick data log
>> +326 auto quick xattr
>
> There is no existing xattr group - normally xattr tests are
> considered part of the "metadata" group. If you want to introduce an
> xattr test group, do it as a separate patch and include all the
> tests (across all the test directories) that manipulate xattrs in
> some way.

Agreed.
V2 follo

Re: [PATCH] fstests: add generic test to verify xattr replace operations are atomic

2014-11-09 Thread Dave Chinner
On Fri, Nov 07, 2014 at 08:40:26PM +, Filipe Manana wrote:
> This test verifies that replacing a xattr's value is an atomic
> operation. This is motivated by an issue in btrfs where replacing
> a xattr's value wasn't an atomic operation, it consisted of
> removing the old value and then inserting the new value in a
> btree. This made readers (getxattr and listxattrs) not getting
> neither the old nor the new value during a short time window.

OK, seems like a good thing to test that the application can only
see the old or the new value.

However, I can't help but wonder about whether the btrfs behaviour
is crash safe as it wasn't designed to be atomic from the ground up.
i.e. if the system crashes half way through a attribute overwrite,
what does btrfs end up with as a result? XFS is guaranteed at a
transactional level to return either the old or the new value,
depending on where in the operaiton the crash occurred, but I'd just
assumed that everyone did attribute replace atomically so it never
occurred to me that it might be an issue...

Perhaps another test involving dm-flakey to trigger errors and
shutdowns whilst in the middle of replacing attributes might be a
good thing?

[]

> +_cleanup()
> +{
> + if [ ! -z $setter_pid ]; then
> + kill $setter_pid &> /dev/null
> + fi

No need to do this if[] then. You're already redirecting all errors
to /dev/null, so if $setter_pid is empty it will just output the
help string.

> + rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +. ./common/attr
> +
> +# real QA test starts here
> +_need_to_be_root
> +_supported_fs generic
> +_supported_os Linux
> +_require_scratch
> +_require_attrs
> +
> +rm -f $seqres.full
> +
> +xattr_name="user.something"
> +xattr_value1="foobar"
> +xattr_value2="rabbit_hole"
> +
> +set_xattr_loop()
> +{
> + local name=$1
> +
> + local cur_val=$xattr_value1
> + while true; do
> + $SETFATTR_PROG -n $xattr_name -v $cur_val $SCRATCH_MNT/$name
> + if [ "$cur_val" == "$xattr_value1" ]; then
> + cur_val=$xattr_value2
> + else
> + cur_val=$xattr_value1
> + fi
> + done
> +}
> +
> +_scratch_mkfs >>$seqres.full 2>&1
> +_scratch_mount
> +
> +test_file="test_xattr_replace"
> +touch $SCRATCH_MNT/$test_file
> +$SETFATTR_PROG -n $xattr_name -v $xattr_value1 $SCRATCH_MNT/$test_file
> +
> +set_xattr_loop $test_file &
> +setter_pid=$!
> +
> +for ((i = 0; i < 1000; i++)); do
> + xattr_val=$($GETFATTR_PROG --absolute-names -n $xattr_name \
> + $SCRATCH_MNT/$test_file | grep "$xattr_name=" | cut -d '=' -f 2)
> + if [ "$xattr_val" != "\"$xattr_value1\"" -a \
> + "$xattr_val" != "\"$xattr_value2\"" ]; then
> + _fail "Missing or unexpected xattr value: $xattr_val"
> + fi

I'd suggest that a filter is a much better way to do this.

name_filter() {
grep "$xattr_name=" | cut -d '=' -f 2 | \
sed -e "s/$xattr_value1/GOOD/" \
-e "s/$xattr_value2/GOOD/"
}

for (); do
$GETFATTR_PROG --absolute-names -n $xattr_name \
$SCRATCH_MNT/$test_file | name_filter
done

And so the output file should just be:

GOOD
GOOD


And if it is bad, then it outputs the bad attribute value instead
of "GOOD", and the test fails on the golden output match.

> +kill $setter_pid &> /dev/null
> +unset setter_pid

See above, no need to unset the var.
> +
> +echo "Silence is golden"
> +status=0
> +exit
> diff --git a/tests/generic/326.out b/tests/generic/326.out
> new file mode 100644
> index 000..4ac0db5
> --- /dev/null
> +++ b/tests/generic/326.out
> @@ -0,0 +1,2 @@
> +QA output created by 326
> +Silence is golden

Except when you are using comparisons and "_fail" instead of
filters. ;)

BTW, new tests should be numbere as the first unused number, not at
the tail (i.e generic/036). If there are collisions with other new
tests when I merge them, I'll renumber them at merge time.

>  323 auto aio stress
>  324 auto fsr quick
>  325 auto quick data log
> +326 auto quick xattr

There is no existing xattr group - normally xattr tests are
considered part of the "metadata" group. If you want to introduce an
xattr test group, do it as a separate patch and include all the
tests (across all the test directories) that manipulate xattrs in
some way.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] add progress indicator to btrfs-convert

2014-11-09 Thread Silvio Fricke
Hi btrfs-ml,

I have tried a ext4->btrfs conversation of a 1TB (90%full) and have seen that
btrfs has no progressbar or something like that. Furthermore I have found [1] a
project idea for this.

I have tried a generic tasklet-based solution to print the status of the
convert-process. Maybe it looks good to you than please add this to your repo.
I have implemented this with the linux specific timerfd facility. To enable
this you haver to use the '-p' option.

I have noticed that a conversation doesn't show all inodes as converted at the
end. How can I improve this? A complete convert-process is showd here:

sfr@devhost ./btrfs-progs % ./btrfs-convert -p /dev/sde4
creating btrfs metadata.
copy inodes [o] [96/   106]
creating ext2fs image file.
cleaning up system chunk.
conversion complete.

[1] https://btrfs.wiki.kernel.org/index.php/Project_ideas#btrfs-convert

Bye and thanks,
Silvio


Silvio Fricke (2):
  btrfs-progs: add task-utils
  btrfs-progs: convert: use task for progress indication of metadata
creation

 Documentation/btrfs-convert.txt |   2 +
 Makefile|   6 +--
 btrfs-convert.c |  60 ++--
 task-util.c | 101 
 task-util.h |  32 +
 5 files changed, 193 insertions(+), 8 deletions(-)
 create mode 100644 task-util.c
 create mode 100644 task-util.h

-- 
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Quota question

2014-11-09 Thread Cyril Scetbon
Hey,

I did some new tests and the issue with quota exceeded comes from the fact that 
the data is not sync to disk : http://pastebin.com/fZdh2YRu

-- 
Cyril SCETBON

> On 09 Nov 2014, at 18:55, Cyril Scetbon  wrote:
> 
> 
> -- 
> Cyril SCETBON
> 
>> 1)  but is the rewrite in that, or in 
>> 3.17, or in the coming 3.18, or not yet entirely there at all... that I 
>> simply don't know.
> 
> If anyone can answer it'd be great !
> 
>> 
>> 2) What I *DO* know and *CAN* say with quite some confidence is this:
>> 
>> With btrfs being in general a copy-on-write filesystem, even deletes 
>> normally require /some/ space, because the metadata pointing at the file 
>> in question must be rewritten, and that requires free space in ordered to 
>> happen.
> ok
>> 
>> At least in the case of full filesystem ENOSPC errors, the space-related 
>> questions in the FAQ on the btrfs wiki suggest truncating a file first.  
>> Find an existing file that either doesn't matter or that you have a 
>> backup of elsewhere, and echo > filename (or echo >| filename if you have 
>> noclobber set), therefore truncating the existing file.  A few of those 
>> and with luck you'll free enough space to do some actual deletions.
>> 
>> While to some extent I'm guessing here since I'm not directly familiar 
>> with quotas myself, it sounds as if such a severe ENOSPC condition, due 
>> in your case to quotas instead of the general filesystem limits I'm more 
>> familiar with, might be what you are seeing, particularly since in your 
>> tests you first triggered the limit, then found a way to write a bit more 
>> anyway, and only THEN found yourself unable to write anything.
>> 
>> If that's the case, then try the truncate trick and see if it helps.
> Unfortunately : http://pastebin.com/1Mw0yiUc
>> 
>> In the generic filesystem case, if the truncate trick doesn't help, 
>> another possible trick is to temporarily add another device of several 
>> gigs capacity to the filesystem, enough to let a balance work and 
>> hopefully reduce the data/metadata imbalance, after which there should be 
>> enough chunk-unallocated free space left on the original device, to do a 
>> btrfs device delete of the temporary device, generally still leaving 
>> space left, since the balance should have reclaimed some otherwise wasted 
>> chunk allocation.
>> 
>> I can't see how that'd apply to quotas, however, and suspect that the 
>> corresponding quota-system solution would be to temporarily up the quota 
>> for the affected user, only long enough to let them do the deletions they 
>> need to do.  But beyond that I can't go, since I simply don't know enough 
>> about the quota domain to further speculate, at this point.
> 
> I'm pretty sure it'd work by setting a temporary higher quota, but my 
> questions are : 
> 
> - why can I exceed the quotas (580MB when 500MB was set as the limit) ?
> 
> qgroupid  exclmax_excl
> ---
> 1/101999 608632832 524288000
> 
> - I really think that the quota limit should take into account an amount of 
> space for operations that would free space
> 
> For this last question I may think that this issue is caused by the fact that 
> I have only one file taking all the space which won't really be the regular 
> use case. but i'd like to get an answer :)
>> 
>> While admittedly limited due to my lack of quota-domain knowledge, 
>> perhaps that sheds at least some light on what happened and why, and 
>> hopefully on how to get out of the situation.
> 
> Thank you Duncan !--
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] btrfs-progs: add task-utils

2014-11-09 Thread Silvio Fricke
Signed-off-by: Silvio Fricke 
---
 task-util.c | 121 
 task-util.h |  33 +
 2 files changed, 154 insertions(+)
 create mode 100644 task-util.c
 create mode 100644 task-util.h

diff --git a/task-util.c b/task-util.c
new file mode 100644
index 000..9268df7
--- /dev/null
+++ b/task-util.c
@@ -0,0 +1,121 @@
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "task-util.h"
+
+struct task_info *task_init(void *(*threadfn)(void *), int (*postfn)(void *),
+   void *thread_private)
+{
+   struct task_info *info = calloc(1, sizeof(struct task_info));
+
+   if (!info)
+   return NULL;
+
+   info->private_data = thread_private;
+   info->threadfn = threadfn;
+   info->postfn = postfn;
+
+   return info;
+}
+
+int task_start(struct task_info *info)
+{
+   int ret;
+
+   if (!info)
+   return -1;
+
+   if (!info->threadfn)
+   return -1;
+
+   ret = pthread_create(&info->id, NULL, info->threadfn,
+info->private_data);
+
+   if (ret == 0)
+   pthread_detach(info->id);
+   else
+   info->id = -1;
+
+   return ret;
+}
+
+void task_stop(struct task_info *info)
+{
+   if (!info)
+   return;
+
+   if (info->periodic.timer_fd)
+   close(info->periodic.timer_fd);
+
+   if (info->id > 0)
+   pthread_cancel(info->id);
+
+   if (info->postfn)
+   info->postfn(info->private_data);
+}
+
+void task_deinit(struct task_info *info)
+{
+   if (!info)
+   return;
+
+   free(info);
+}
+
+int task_period_start(struct task_info *info, unsigned int period_ms)
+{
+   unsigned int ns;
+   unsigned int sec;
+   struct itimerspec itval;
+
+   if (!info)
+   return -1;
+
+   info->periodic.timer_fd = timerfd_create(CLOCK_MONOTONIC, 0);
+   if (info->periodic.timer_fd == -1)
+   return info->periodic.timer_fd;
+
+   info->periodic.wakeups_missed = 0;
+
+   sec = period_ms/1000;
+   ns = (period_ms - (sec * 1000)) * 1000;
+   itval.it_interval.tv_sec = sec;
+   itval.it_interval.tv_nsec = ns;
+   itval.it_value.tv_sec = sec;
+   itval.it_value.tv_nsec = ns;
+
+   return timerfd_settime(info->periodic.timer_fd, 0, &itval, NULL);
+};
+
+void task_period_wait(struct task_info *info)
+{
+   unsigned long long missed;
+   int ret;
+
+   if (!info)
+   return;
+
+   ret = read(info->periodic.timer_fd, &missed, sizeof (missed));
+   if (ret == -1) {
+   perror ("read timer");
+   return;
+   }
+
+   if (missed > 0)
+   info->periodic.wakeups_missed += (missed - 1);
+}
+
+void task_period_stop(struct task_info *info)
+{
+   if (!info)
+   return;
+
+   if (info->periodic.timer_fd) {
+   timerfd_settime(info->periodic.timer_fd, 0, NULL, NULL);
+   close(info->periodic.timer_fd);
+   }
+}
diff --git a/task-util.h b/task-util.h
new file mode 100644
index 000..95f7b5b
--- /dev/null
+++ b/task-util.h
@@ -0,0 +1,33 @@
+
+#ifndef __PROGRESS_
+#define __PROGRESS_
+
+#include 
+
+struct periodic_info {
+   int timer_fd;
+   unsigned long long wakeups_missed;
+};
+
+struct task_info {
+   struct periodic_info periodic;
+   pthread_t id;
+   void *private_data;
+   void *(*threadfn)(void *);
+   int (*postfn)(void *);
+};
+
+/* task life cycle */
+struct task_info *task_init(void *(*threadfn)(void *), int (*postfn)(void *),
+   void *thread_private);
+int task_start(struct task_info *info);
+void task_stop(struct task_info *info);
+void task_deinit(struct task_info *info);
+
+/* periodic life cycle */
+int task_period_start(struct task_info *info, unsigned int period_ms);
+void task_period_wait(struct task_info *info);
+void task_period_stop(struct task_info *info);
+
+#endif /* __PROGRESS_ */
+
-- 
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] btrfs-progs: convert: use task for progress indication of metadata creation

2014-11-09 Thread Silvio Fricke
Signed-off-by: Silvio Fricke 
---
 Documentation/btrfs-convert.txt |  2 ++
 Makefile|  6 ++--
 btrfs-convert.c | 64 +
 3 files changed, 64 insertions(+), 8 deletions(-)

diff --git a/Documentation/btrfs-convert.txt b/Documentation/btrfs-convert.txt
index 555fb35..9cc326c 100644
--- a/Documentation/btrfs-convert.txt
+++ b/Documentation/btrfs-convert.txt
@@ -29,6 +29,8 @@ Roll back to ext2fs.
 set filesystem label during conversion.
 -L::
 use label from the converted filesystem.
+-p::
+Show progress of convertation.
 
 EXIT STATUS
 ---
diff --git a/Makefile b/Makefile
index 203597c..f76c6b2 100644
--- a/Makefile
+++ b/Makefile
@@ -8,7 +8,7 @@ AM_CFLAGS = -Wall -D_FILE_OFFSET_BITS=64 -DBTRFS_FLAT_INCLUDES 
-fno-strict-alias
 CFLAGS = -g -O1 -fno-strict-aliasing -rdynamic
 objects = ctree.o disk-io.o radix-tree.o extent-tree.o print-tree.o \
  root-tree.o dir-item.o file-item.o inode-item.o inode-map.o \
- extent-cache.o extent_io.o volumes.o utils.o repair.o \
+ extent-cache.o extent_io.o volumes.o utils.o repair.o task-util.o \
  qgroup.o raid6.o free-space-cache.o list_sort.o props.o \
  ulist.o qgroup-verify.o backref.o
 cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
@@ -20,13 +20,13 @@ libbtrfs_objects = send-stream.o send-utils.o rbtree.o 
btrfs-list.o crc32c.o \
   uuid-tree.o utils-lib.o rbtree-utils.o
 libbtrfs_headers = send-stream.h send-utils.h send.h rbtree.h btrfs-list.h \
   crc32c.h list.h kerncompat.h radix-tree.h extent-cache.h \
-  extent_io.h ioctl.h ctree.h btrfsck.h version.h
+  extent_io.h ioctl.h ctree.h btrfsck.h version.h task-util.h
 TESTS = fsck-tests.sh convert-tests.sh
 
 INSTALL = install
 prefix ?= /usr/local
 bindir = $(prefix)/bin
-lib_LIBS = -luuid -lblkid -lm -lz -llzo2 -L.
+lib_LIBS = -luuid -lblkid -lm -lz -llzo2 -L. -pthread
 libdir ?= $(prefix)/lib
 incdir = $(prefix)/include/btrfs
 LIBS = $(lib_LIBS) $(libs_static)
diff --git a/btrfs-convert.c b/btrfs-convert.c
index a544fc6..f8ba920 100644
--- a/btrfs-convert.c
+++ b/btrfs-convert.c
@@ -38,6 +38,7 @@
 #include "transaction.h"
 #include "crc32c.h"
 #include "utils.h"
+#include "task-util.h"
 #include 
 #include 
 #include 
@@ -45,6 +46,41 @@
 #define INO_OFFSET (BTRFS_FIRST_FREE_OBJECTID - EXT2_ROOT_INO)
 #define EXT2_IMAGE_SUBVOL_OBJECTID BTRFS_FIRST_FREE_OBJECTID
 
+struct private {
+   uint32_t max_copy_inodes;
+   uint32_t cur_copy_inodes;
+   struct task_info *info;
+};
+
+static void *print_copied_inodes(void *p)
+{
+   struct private *priv = p;
+   const char work_indicator[] = { '.', 'o', 'O', 'o' };
+   uint32_t count = 0;
+
+   task_period_start(priv->info, 1000 /* 1s */);
+   while (1) {
+   count++;
+   printf("copy inodes [%c] [%10d/%10d]\r",
+  work_indicator[count % 4], priv->cur_copy_inodes,
+  priv->max_copy_inodes);
+   fflush(stdout);
+   task_period_wait(priv->info);
+   }
+
+   return NULL;
+}
+
+static int after_copied_inodes(void *p)
+{
+   struct private *priv = p;
+
+   printf("\n");
+   task_period_stop(priv->info);
+
+   return 0;
+}
+
 /*
  * Open Ext2fs in readonly mode, read block allocation bitmap and
  * inode bitmap into memory.
@@ -1036,7 +1072,7 @@ fail:
  * scan ext2's inode bitmap and copy all used inodes.
  */
 static int copy_inodes(struct btrfs_root *root, ext2_filsys ext2_fs,
-  int datacsum, int packing, int noxattr)
+  int datacsum, int packing, int noxattr, struct private 
*p)
 {
int ret;
errcode_t err;
@@ -1068,6 +1104,7 @@ static int copy_inodes(struct btrfs_root *root, 
ext2_filsys ext2_fs,
objectid, ext2_fs, ext2_ino,
&ext2_inode, datacsum, packing,
noxattr);
+   p->cur_copy_inodes++;
if (ret)
return ret;
if (trans->blocks_used >= 4096) {
@@ -2197,7 +2234,7 @@ err:
 }
 
 static int do_convert(const char *devname, int datacsum, int packing, int 
noxattr,
-  int copylabel, const char *fslabel)
+  int copylabel, const char *fslabel, int progress)
 {
int i, ret;
int fd = -1;
@@ -2275,11 +2312,23 @@ static int do_convert(const char *devname, int 
datacsum, int packing, int noxatt
goto fail;
}
printf("creating btrfs metadata.\n");
-   ret = copy_inodes(root, ext2_fs, datacsum, packing, noxattr);
+   struct private p = {
+   .max_copy_inodes = (ext2_fs->super->s_inodes_count - 
ext2_fs->super->s_free_inodes_count),
+   .cur_copy_inodes = 0,
+   };
+   if (progress) {
+

Re: Quota question

2014-11-09 Thread Cyril Scetbon

-- 
Cyril SCETBON

> 1)  but is the rewrite in that, or in 
> 3.17, or in the coming 3.18, or not yet entirely there at all... that I 
> simply don't know.

If anyone can answer it'd be great !

> 
> 2) What I *DO* know and *CAN* say with quite some confidence is this:
> 
> With btrfs being in general a copy-on-write filesystem, even deletes 
> normally require /some/ space, because the metadata pointing at the file 
> in question must be rewritten, and that requires free space in ordered to 
> happen.
ok
> 
> At least in the case of full filesystem ENOSPC errors, the space-related 
> questions in the FAQ on the btrfs wiki suggest truncating a file first.  
> Find an existing file that either doesn't matter or that you have a 
> backup of elsewhere, and echo > filename (or echo >| filename if you have 
> noclobber set), therefore truncating the existing file.  A few of those 
> and with luck you'll free enough space to do some actual deletions.
> 
> While to some extent I'm guessing here since I'm not directly familiar 
> with quotas myself, it sounds as if such a severe ENOSPC condition, due 
> in your case to quotas instead of the general filesystem limits I'm more 
> familiar with, might be what you are seeing, particularly since in your 
> tests you first triggered the limit, then found a way to write a bit more 
> anyway, and only THEN found yourself unable to write anything.
> 
> If that's the case, then try the truncate trick and see if it helps.
Unfortunately : http://pastebin.com/1Mw0yiUc
> 
> In the generic filesystem case, if the truncate trick doesn't help, 
> another possible trick is to temporarily add another device of several 
> gigs capacity to the filesystem, enough to let a balance work and 
> hopefully reduce the data/metadata imbalance, after which there should be 
> enough chunk-unallocated free space left on the original device, to do a 
> btrfs device delete of the temporary device, generally still leaving 
> space left, since the balance should have reclaimed some otherwise wasted 
> chunk allocation.
> 
> I can't see how that'd apply to quotas, however, and suspect that the 
> corresponding quota-system solution would be to temporarily up the quota 
> for the affected user, only long enough to let them do the deletions they 
> need to do.  But beyond that I can't go, since I simply don't know enough 
> about the quota domain to further speculate, at this point.

I'm pretty sure it'd work by setting a temporary higher quota, but my questions 
are : 

- why can I exceed the quotas (580MB when 500MB was set as the limit) ?

qgroupid  exclmax_excl
---
1/101999 608632832 524288000

- I really think that the quota limit should take into account an amount of 
space for operations that would free space

For this last question I may think that this issue is caused by the fact that I 
have only one file taking all the space which won't really be the regular use 
case. but i'd like to get an answer :)
> 
> While admittedly limited due to my lack of quota-domain knowledge, 
> perhaps that sheds at least some light on what happened and why, and 
> hopefully on how to get out of the situation.

Thank you Duncan !--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] Btrfs: make xattr replace operations atomic

2014-11-09 Thread Filipe Manana
Replacing a xattr consists of doing a lookup for its existing value, delete
the current value from the respective leaf, release the search path and then
finally insert the new value. This leaves a time window where readers (getxattr,
listxattrs) won't see any value for the xattr. Xattrs are used to store ACLs,
so this has security implications.

This change also fixes 2 other existing issues which were:

*) Deleting the old xattr value without verifying first if the new xattr will
   fit in the existing leaf item (in case multiple xattrs are packed in the
   same item due to name hash collision);

*) Returning -EEXIST when the flag XATTR_CREATE is given and the xattr doesn't
   exist but we have have an existing item that packs muliple xattrs with
   the same name hash as the input xattr. In this case we should return ENOSPC.

A test case for xfstests follows soon.

Thanks to Alexandre Oliva for reporting the non-atomicity of the xattr replace
implementation.

Reported-by: Alexandre Oliva 
Signed-off-by: Filipe Manana 
---

V2: Added missing btrfs_mark_buffer_dirty() call for the case where a replace
happens for an item with no other xattrs and the new xattr value size is
the same as the old xattr value size;
Made btrfs_search_slot not release the path if it returns EOVERFLOW (from
split_leaf), so that we can get the current size of the item when doing
the replace and extend the leaf to a size smaller then what we passed to
btrfs_search_slot if the xattr exists in the item (and the new extend size
is new_size - old_size, excluding xattr's name and sizeof struct 
btrfs_dir_item).

V3: Made btrfs_search_slot not release the path on error logic more explicit
and generic.

V4: Added missing ENOSPC condition - if overflow detected, there's no matching
dir_item for our xattr in the leaf item and there are no flags. Ensure
a replace can't insert a new value.

 fs/btrfs/ctree.c|   2 +-
 fs/btrfs/ctree.h|   5 ++
 fs/btrfs/dir-item.c |  10 ++--
 fs/btrfs/xattr.c| 150 
 4 files changed, 102 insertions(+), 65 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 19bc616..8172341 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2939,7 +2939,7 @@ done:
 */
if (!p->leave_spinning)
btrfs_set_path_blocking(p);
-   if (ret < 0)
+   if (ret < 0 && !p->skip_release_on_error)
btrfs_release_path(p);
return ret;
 }
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a5e471a..e3ab75e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -607,6 +607,7 @@ struct btrfs_path {
unsigned int leave_spinning:1;
unsigned int search_commit_root:1;
unsigned int need_commit_sem:1;
+   unsigned int skip_release_on_error:1;
 };
 
 /*
@@ -3687,6 +3688,10 @@ struct btrfs_dir_item *btrfs_lookup_xattr(struct 
btrfs_trans_handle *trans,
 int verify_dir_item(struct btrfs_root *root,
struct extent_buffer *leaf,
struct btrfs_dir_item *dir_item);
+struct btrfs_dir_item *btrfs_match_dir_item_name(struct btrfs_root *root,
+struct btrfs_path *path,
+const char *name,
+int name_len);
 
 /* orphan.c */
 int btrfs_insert_orphan_item(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/dir-item.c b/fs/btrfs/dir-item.c
index fc8df86..1752625 100644
--- a/fs/btrfs/dir-item.c
+++ b/fs/btrfs/dir-item.c
@@ -21,10 +21,6 @@
 #include "hash.h"
 #include "transaction.h"
 
-static struct btrfs_dir_item *btrfs_match_dir_item_name(struct btrfs_root 
*root,
- struct btrfs_path *path,
- const char *name, int name_len);
-
 /*
  * insert a name into a directory, doing overflow properly if there is a hash
  * collision.  data_size indicates how big the item inserted should be.  On
@@ -383,9 +379,9 @@ struct btrfs_dir_item *btrfs_lookup_xattr(struct 
btrfs_trans_handle *trans,
  * this walks through all the entries in a dir item and finds one
  * for a specific name.
  */
-static struct btrfs_dir_item *btrfs_match_dir_item_name(struct btrfs_root 
*root,
- struct btrfs_path *path,
- const char *name, int name_len)
+struct btrfs_dir_item *btrfs_match_dir_item_name(struct btrfs_root *root,
+struct btrfs_path *path,
+const char *name, int name_len)
 {
struct btrfs_dir_item *dir_item;
unsigned long name_ptr;
diff --git a/fs/btrfs/xattr.c b/fs/btrfs/xattr.c
index dcf2013..47b1946 100644
--- a/fs/btrfs/xattr.c
+++ b/fs/btrfs/xattr.c
@@ -29,6 +29,7 @@
 #include "xattr.h"
 #include "disk-io.h"
 #include "props.h"
+#include "locking.h"
 
 
 ssize_t __btrfs_getxatt