Re: why does df spin up disks?

2015-06-29 Thread Holger Hoffstätte
On 06/29/15 03:43, Russell Coker wrote:
> When I have a mounted filesystem why doesn't the kernel store the amount of 
> free space?  Why does it need to spin up a disk that had been spun down?

Most likely because the inode has been evicted due to memory pressure. I can df 
my mostly-idle backup disk "most" of the time without it spinning up once it 
has been mounted & gone to sleep (just did!), but if there's been significant 
memory movement (or issuing drop_caches) on the box it will spin up again 
sometimes. This is not unique to btrfs; other filesystems - at least ext4 - do 
this too, even though they might manage their expiry behaviour differently.

Now, whether the root inode and whatever is required for a df *should* ever 
expire after mounting or stay pinned, well..you'd have to ask the vfs folks.

-h

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs replace seems to corrupt the file system

2015-06-29 Thread Duncan
Mordechay Kaganer posted on Mon, 29 Jun 2015 08:02:01 +0300 as excerpted:

> On Sun, Jun 28, 2015 at 10:32 PM, Chris Murphy 
> wrote:
>> On Sun, Jun 28, 2015 at 1:20 PM, Mordechay Kaganer 
>> wrote:
>>
>> Use of dd can cause corruption of the original.
>>
> But doing a block-level copy and taking care that the original volume is
> hidden from the kernel while mounting the new one is safe, isn't it?

As long as neither one is mounted while doing the copy, and one or the 
other is hidden before an attempt to mount, it should be safe, yes.

The base problem is that btrfs can be multi-device, and that it tracks 
the devices belonging to the filesystem based on UUID, so as soon as it 
sees another device with the same UUID, it considers it part of the same 
filesystem.  Writes can go to any of the devices it considers a component 
device, and after a write creates a difference, reads can end up coming 
from the stale one.

Meanwhile, unlike many filesystems, btrfs uses the UUID as part of the 
metadata, so changing the UUID isn't as simple as rewriting a superblock; 
the metadata must be rewritten to the new UUID.  There's actually a tool 
now available to do just that, but it's new enough I'm not even sure it's 
available in release form yet; if so, it'll be latest releases.  
Otherwise, it'd be in integration branch.

And FWIW a different aspect of the same problem can occur in raid1 mode, 
when a device drops out and is later reintroduced, with both devices 
separately mounted rw,degraded and updated in the mean time.  Normally, 
btrfs will track the generation, a monotonically increasing integer, and 
will read from the higher/newer generation, but with separate updates to 
each, if they both happen to have the same generation at reunite...

So for raid1 mode, the recommendation is that if there's a split and one 
continues to be updated, be sure the other one isn't separately mounted 
writable and then the two combined again, or if both must be separately 
mounted writable and then recombined, wipe the one and add it as a new 
device, thus avoiding the possibility of confusion.

> Anyway, what is the "strait forward" and recommended way of replacing
> the underlying device on a single-device btrfs not using any raid
> features? I can see 3 options:
> 
> 1. btrfs replace - as far as i understand, it's primarily intended for
> replacing the member disks under btrfs's raid.

It seems this /can/ work.  You demonstrated that much.  But I'm not sure 
whether btrfs replace was actually designed to do the single-device 
replace.  If not, it almost certainly hasn't been tested for it.  Even if 
so, I'm sure I'm not the only one who hadn't thought of using it that 
way, so while it might have been development-tested for single-device-
replace, it's unlikely to have had the same degree of broader testing of 
actual usage, simply because few even thought of using it that way.

Regardless, you seem to have flushed out some bugs.  Now that they're 
visible and the weekend's over, the devs will likely get to work tracing 
them down and fixing them.

> 2, Add a new volume, then remove the old one. Maybe this way we'll need
> to do a full balance after that?

This is the alternative I'd have used in your scenario (but see below).  
Except a manual balance shouldn't be necessary.  The device add part 
should go pretty fast as it would simply make more space available.  The 
device remove will go much slower as in effect it'll trigger that 
balance, forcing everything over to the just added pretty much empty 
device.

You'd do a manual balance if you wanted to convert to raid or some such, 
but from single device to single device, just the add/remove should do it.

> 3. Block-level copy of the partition, then hide the original from the
> kernel to avoid confusion because of the same UUID. Of course, this way
> the volume is going to be off-line until the copy is finished.

This could work too, but in addition to being forced to keep the 
filesystem offline the entire time, the block-level copy will copy any 
problems, etc, too.


But what I'd /prefer/ to do would be to take the opportunity to create a 
new filesystem, possibly using different mkfs.btrfs options or at least 
starting new with a fresh filesystem and thus eliminating any as yet 
undetected or still developing problems with the old filesystem.  Since 
the replace or device remove will end up rewriting everything anyway, 
might as well make a clean break and start fresh, would be my thinking.

You could then use send/receive to copy all the snapshots, etc, over.  
Currently, that would need to be done one at a time, but there's 
discussion of adding a subvolume-recursive mode.

Tho while on the subject of snapshots, it should be noted that btrfs 
operations such as balance don't scale so well with tens of thousands of 
snapshots.  So the recommendation is to try to keep it to 250 snapshots 
or so per subvolume, under 2000 snapshots total, if possible, which of 
c

Scrub errors but no errors ?!?

2015-06-29 Thread Swâmi Petaramesh
Hi,

Using kernel 3.19 on Ubuntu 15.04

I have a BTRFS Raid-1 FS made from 2 Luks-encrypted devices, each one built 
atop a bcache device.

The machine runs smoothly from this setup.

When I scrub said FS, it ends by spitting a message on console stating that 
"Scrub found errors but corrected them"

However if I then type :

# btrfs scrub stat /
scrub status for 53a35066-1a7b-4fe1-9f94-a923a7f9e3af
scrub started at Sun Jun 28 11:17:50 2015 and finished after 30657 
seconds
total bytes scrubbed: 3.01TiB with 0 errors

# btrfs dev stat /
[/dev/mapper/c_b].write_io_errs   0
[/dev/mapper/c_b].read_io_errs0
[/dev/mapper/c_b].flush_io_errs   0
[/dev/mapper/c_b].corruption_errs 0
[/dev/mapper/c_b].generation_errs 0
[/dev/mapper/c_a].write_io_errs   0
[/dev/mapper/c_a].read_io_errs0
[/dev/mapper/c_a].flush_io_errs   0
[/dev/mapper/c_a].corruption_errs 0
[/dev/mapper/c_a].generation_errs 0

...How comes ?

TIA for any insight.

-- 
Swâmi Petaramesh  http://petaramesh.org PGP 9076E32E

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V11 02/21] Btrfs: subpagesize-blocksize: Fix whole page write.

2015-06-29 Thread Chandan Rajendra
On Friday 26 Jun 2015 17:50:54 Liu Bo wrote:
> On Mon, Jun 01, 2015 at 08:52:37PM +0530, Chandan Rajendra wrote:
> > For the subpagesize-blocksize scenario, a page can contain multiple
> > blocks. In such cases, this patch handles writing data to files.
> > 
> > Also, When setting EXTENT_DELALLOC, we no longer set EXTENT_UPTODATE bit
> > on
> > the extent_io_tree since uptodate status is being tracked by the bitmap
> > pointed to by page->private.
> 
> To be honestly, I'm not sure why we set EXTENT_UPTODATE bit for data as we
> don't check for that bit at all for now, correct me if I'm wrong.

Yes, I didn't find any code using EXTENT_UPTODATE flag. That is probably
because we could get away by referring to the page's PG_uptodate flag in
blocksize == Pagesize scenario. But for the subpagesize-blocksize scenario we
need BLK_STATE_UPTODATE to determine if a page's PG_uptodate flag can be set.

> 
> > Signed-off-by: Chandan Rajendra 
> > ---
> > 
> >  fs/btrfs/extent_io.c | 141
> >  +++ fs/btrfs/file.c 
> >  |  16 ++
> >  fs/btrfs/inode.c |  58 -
> >  3 files changed, 125 insertions(+), 90 deletions(-)
> > 
> > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > index d37badb..3736ab5 100644
> > --- a/fs/btrfs/extent_io.c
> > +++ b/fs/btrfs/extent_io.c
> > @@ -1283,9 +1283,8 @@ int clear_extent_bits(struct extent_io_tree *tree,
> > u64 start, u64 end,> 
> >  int set_extent_delalloc(struct extent_io_tree *tree, u64 start, u64 end,
> >  
> > struct extent_state **cached_state, gfp_t mask)
> >  
> >  {
> > 
> > -   return set_extent_bit(tree, start, end,
> > - EXTENT_DELALLOC | EXTENT_UPTODATE,
> > - NULL, cached_state, mask);
> > +   return set_extent_bit(tree, start, end, EXTENT_DELALLOC,
> > +   NULL, cached_state, mask);
> > 
> >  }
> >  
> >  int set_extent_defrag(struct extent_io_tree *tree, u64 start, u64 end,
> > 
> > @@ -1498,25 +1497,6 @@ int extent_range_redirty_for_io(struct inode
> > *inode, u64 start, u64 end)> 
> > return 0;
> >  
> >  }
> > 
> > -/*
> > - * helper function to set both pages and extents in the tree writeback
> > - */
> > -static int set_range_writeback(struct extent_io_tree *tree, u64 start,
> > u64 end) -{
> > -   unsigned long index = start >> PAGE_CACHE_SHIFT;
> > -   unsigned long end_index = end >> PAGE_CACHE_SHIFT;
> > -   struct page *page;
> > -
> > -   while (index <= end_index) {
> > -   page = find_get_page(tree->mapping, index);
> > -   BUG_ON(!page); /* Pages should be in the extent_io_tree */
> > -   set_page_writeback(page);
> > -   page_cache_release(page);
> > -   index++;
> > -   }
> > -   return 0;
> > -}
> > -
> > 
> >  /* find the first state struct with 'bits' set after 'start', and
> >  
> >   * return it.  tree->lock must be held.  NULL will returned if
> >   * nothing was found after 'start'
> > 
> > @@ -2080,6 +2060,14 @@ static int page_read_complete(struct page *page)
> > 
> > return !test_page_blks_state(page, BLK_STATE_IO, start, end, 0);
> >  
> >  }
> > 
> > +static int page_write_complete(struct page *page)
> > +{
> > +   u64 start = page_offset(page);
> > +   u64 end = start + PAGE_CACHE_SIZE - 1;
> > +
> > +   return !test_page_blks_state(page, BLK_STATE_IO, start, end, 0);
> > +}
> > +
> > 
> >  int free_io_failure(struct inode *inode, struct io_failure_record *rec)
> >  {
> >  
> > int ret;
> > 
> > @@ -2575,38 +2563,37 @@ int end_extent_writepage(struct page *page, int
> > err, u64 start, u64 end)> 
> >   */
> >  
> >  static void end_bio_extent_writepage(struct bio *bio, int err)
> >  {
> > 
> > +   struct btrfs_page_private *pg_private;
> > 
> > struct bio_vec *bvec;
> > 
> > +   unsigned long flags;
> > 
> > u64 start;
> > u64 end;
> > 
> > +   int clear_writeback;
> > 
> > int i;
> > 
> > bio_for_each_segment_all(bvec, bio, i) {
> > 
> > struct page *page = bvec->bv_page;
> > 
> > -   /* We always issue full-page reads, but if some block
> > -* in a page fails to read, blk_update_request() will
> > -* advance bv_offset and adjust bv_len to compensate.
> > -* Print a warning for nonzero offsets, and an error
> > -* if they don't add up to a full page.  */
> > -   if (bvec->bv_offset || bvec->bv_len != PAGE_CACHE_SIZE) {
> > -   if (bvec->bv_offset + bvec->bv_len != PAGE_CACHE_SIZE)
> > -   btrfs_err(BTRFS_I(page->mapping->host)->root-
>fs_info,
> > -  "partial page write in btrfs with offset %u 
and length %u",
> > -   bvec->bv_offset, bvec->bv_len);
> > -   else
> > -   btrfs_info(BTRFS_I(page->mapping->host)->root-
>fs_info,
> > -  "incomplete page wri

Re: [PATCH 1/2] [btrfs] btrfs_rename: abort transaction in case of error.

2015-06-29 Thread Filipe David Manana
On Sun, Jun 28, 2015 at 10:47 PM, Davide C. C. Italiano
 wrote:
> From: Davide Italiano 
>
> btrfs_insert_inode_ref() may fail and we want to make sure
> the transaction is aborted before calling btrfs_end_transaction(),
> as it already happens everywhere else in this function in case
> of error.
>
> Signed-off-by: Davide Italiano 
> ---
>  fs/btrfs/inode.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 8bb0136..59c475c 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -9114,8 +9114,11 @@ static int btrfs_rename(struct inode *old_dir, struct 
> dentry *old_dentry,
>  new_dentry->d_name.len,
>  old_ino,
>  btrfs_ino(new_dir), index);
> -   if (ret)
> +   if (ret) {
> +   btrfs_abort_transaction(trans, root, ret);
> goto out_fail;
> +   }
> +

Hi,

I don't think we need a transaction abortion here. The reason it's not
being done is likely because at that point the trees are in a
consistent state (i.e. we haven't touched any of them yet) and not
because it was forgotten. So an abortion there is
unnecessary/excessive.

thanks

> /*
>  * this is an ugly little race, but the rename is required
>  * to make sure that if we crash, the inode is either at the
> --
> 2.4.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] [btrfs] btrfs_rename(): don't ignore btrfs_end_transaction() return

2015-06-29 Thread Filipe David Manana
On Sun, Jun 28, 2015 at 10:47 PM, Davide C. C. Italiano
 wrote:
> From: Davide Italiano 
>
> btrfs_end_transaction() can return an error -- this happens, e.g.
> if it tries to commit and the transaction was aborted in the meanhwile.
> Swallowing the error is wrong, so explicitly return it.
>
> Signed-off-by: Davide Italiano 
> ---
>  fs/btrfs/inode.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 59c475c..7764132 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -9199,7 +9199,7 @@ static int btrfs_rename(struct inode *old_dir, struct 
> dentry *old_dentry,
> btrfs_end_log_trans(root);
> }
>  out_fail:
> -   btrfs_end_transaction(trans, root);
> +   ret = btrfs_end_transaction(trans, root);

Hi,

Good intention but it's now swallowing errors from earlier places in
the code that jump to the out_fail label. For e.g. if the call to
btrfs_set_inode_index() fails, we jump to out_fail and we lose the
error value that it returned (btrfs_end_transaction() returns 0 for
e.g.), so userspace thinks everything succeed when it didn't.
Correct fix is to set ret to the return value of
btrfs_end_transaction() only if ret is currently zero.

thanks

>  out_notrans:
> if (old_ino == BTRFS_FIRST_FREE_OBJECTID)
> up_read(&root->fs_info->subvol_sem);
> --
> 2.4.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


WARNING: at /fs/btrfs/extent-tree.c:4029 btrfs_free_reserved_data_space

2015-06-29 Thread David Weber
Hi,

we are testing Btrfs as a kvm storage system with 
"defaults,space_cache,nodatacow" mount options and regular snapshots. While 
testing we sometimes get these warnings:
WARNING: CPU: 5 PID: 11335 at /home/kernel/COD/linux/fs/btrfs/extent-
tree.c:4029 btrfs_free_reserved_data_space+0x102/0x110 [btrfs]()

Full log:
https://gist.github.com/anonymous/5f672f72899a3d52c20c

Once they start, they flood dmesg with several messages per second. After a 
reboot, it takes a while until they start to appear again. 
We have seen them with Linux 4.0.6 and 4.1

This was already reported a few times:
https://bugzilla.opensuse.org/show_bug.cgi?id=904023
https://bugzilla.redhat.com/show_bug.cgi?id=1173937
http://www.spinics.net/lists/linux-btrfs/msg39069.html
https://lkml.org/lkml/2014/10/22/899

Always with no solution
Is this a real problem or can we safely ignore this?

Cheers,
David
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Scrub errors but no errors ?!?

2015-06-29 Thread Duncan
Swâmi Petaramesh posted on Mon, 29 Jun 2015 10:42:14 +0200 as excerpted:

> Using kernel 3.19 on Ubuntu 15.04
> 
> I have a BTRFS Raid-1 FS made from 2 Luks-encrypted devices, each one
> built atop a bcache device.
> 
> The machine runs smoothly from this setup.
> 
> When I scrub said FS, it ends by spitting a message on console stating
> that "Scrub found errors but corrected them"
> 
> However if I then type :
> 
> # btrfs scrub stat /
> scrub status for 53a35066-1a7b-4fe1-9f94-a923a7f9e3af
> scrub started at Sun Jun 28 11:17:50 2015 and finished after
> 30657 seconds
> total bytes scrubbed: 3.01TiB with 0 errors


FWIW, I've seeing (and wondering) the same thing, tho with newer kernel 
(v4.1.0) and userspace (also v4.1)

I'm running pair-device raid1 mode (both data/metadata) but one of the 
two devices is slowly relocating sectors, so I thought it /might/ be 
btrfs detecting some oddness due to that.

But if you're seeing it too, that's unlikely.  So indeed, what's this 
found errors and corrected them bit, with no errors shown as found?

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: at /fs/btrfs/extent-tree.c:4029 btrfs_free_reserved_data_space

2015-06-29 Thread Lutz Vieweg

On 06/29/2015 11:35 AM, David Weber wrote:

we are testing Btrfs as a kvm storage system with
"defaults,space_cache,nodatacow" mount options and regular snapshots.


That sounds brave - even with "nodatacow" it appeared to me
that using btrfs with often partially overwritten files like
VM images results in excessively fragmented files.
And taking snapshots kind of counteracts "nodatacow".

What does "filefrag" tell about your VM images on btrfs?

(As much as I like btrfs for other purposes, I currently stay
with XFS for VM images, database files and alike.)

Regards,

Lutz Vieweg

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] check: check so offset is not bigger then the leaf

2015-06-29 Thread Trollkarlen Marklund

> On 18 Jun 2015, at 19:44, David Sterba  wrote:
> 
> On Thu, Jun 18, 2015 at 01:59:13AM +0200, Robert Marklund wrote:
>> This could crash before because of dangerous dangling
>> offset of pointer.
> 
> That's right, this can happen. There are more btrfs_item_ptr that would
> be good to validate that way, namely in the checker as it's most likely
> to see corrupted data.
> 
> I think it's worth to add a wrapper macro for that, that would be like
> 
> (int) btrfs_item_ptr_validate(ei, leaf, slot, struct ..., *optional_key)
> 
> and return 0 if it's ok, 1 if there's a problem and prints the details.
> 
>> Signed-off-by: Robert Marklund 
>> ---
>> cmds-check.c | 10 ++
>> 1 file changed, 10 insertions(+)
>> 
>> diff --git a/cmds-check.c b/cmds-check.c
>> index 778f141..da36758 100644
>> --- a/cmds-check.c
>> +++ b/cmds-check.c
>> @@ -8906,6 +8906,16 @@ static int build_roots_info_cache(struct 
>> btrfs_fs_info *info)
>>  goto next;
>> 
>>  ei = btrfs_item_ptr(leaf, slot, struct btrfs_extent_item);
>> +
>> +if ((long long)ei > info->extent_root->leafsize) {
>> + fprintf(stderr, "Bad leaf = %p, slot = %d\n", leaf, 
>> slot);
>> + fprintf(stderr, "item ptr = %p\n", ei);
>> + fprintf(stderr, "objectid = %llx\n", 
>> found_key.objectid);
>> + fprintf(stderr, "type = %x\n", found_key.type);
>> + fprintf(stderr, "offset   = %llx\n", found_key.offset);
> 
> Hm, I'm not sure whether to continue or fail at this point.
> 

Im not either :)
But for me its better to keep trying until you hot the wall for real.


> Do you have a crafted filesystem image that can reproduce that or was
> that found by code inspection?

I have a failed filesystem caused by a failing disk that I tried to fix/recover.
Then i stumbled on this, and later on on some more places other then this.
Ill submit that also and in a nicer way when my filesystem is rescued. 
 

> 
>> + goto next;
>> +}
>> +
>>  flags = btrfs_extent_flags(leaf, ei);
>> 
>>  if (found_key.type == BTRFS_EXTENT_ITEM_KEY &&

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs replace seems to corrupt the file system

2015-06-29 Thread Mike Fleetwood
On 29 June 2015 at 09:08, Duncan <1i5t5.dun...@cox.net> wrote:
> Meanwhile, unlike many filesystems, btrfs uses the UUID as part of the
> metadata, so changing the UUID isn't as simple as rewriting a superblock;
> the metadata must be rewritten to the new UUID.  There's actually a tool
> now available to do just that, but it's new enough I'm not even sure it's
> available in release form yet; if so, it'll be latest releases.
> Otherwise, it'd be in integration branch.

FYI, btrfstune with changing file system UUID capability, was included
in btrfs-progs 4.1 released last week, Mon, 22 Jun 2015.
http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg44182.html

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs replace seems to corrupt the file system

2015-06-29 Thread Mordechay Kaganer
B.H.

Regarding the main issue, the drive that was "recovered" using Noah's
trick (mount -o degraded then btrfs replace cancel) appears to be
clean. At least, it passes scrub without any errors. It even contains
all changes that were made during the replace was ongoing. Also i've
run MD's consistency check on the destination drive which contains the
corrupt FS and it appears to be clean from MD's point of view, so i
think it can be considered a "proof" the btrfs replace was actually
the source of the corruption. I'll try to reproduce the situation
before trying to upgrade the kernel/btrfs-progs with smaller loopback
devices. Not sure if it is reproducible so easily. The original
replace operation took more than 5 days and i'm not going to play with
the actual data again ;-).

If the "corrupt" version of the FS may help in debugging the issue,
please contact me today, before we have wiped it out.


On Mon, Jun 29, 2015 at 11:08 AM, Duncan <1i5t5.dun...@cox.net> wrote:
> Mordechay Kaganer posted on Mon, 29 Jun 2015 08:02:01 +0300 as excerpted:
>> 1. btrfs replace - as far as i understand, it's primarily intended for
>> replacing the member disks under btrfs's raid.
>
> It seems this /can/ work.  You demonstrated that much.  But I'm not sure
> whether btrfs replace was actually designed to do the single-device
> replace.  If not, it almost certainly hasn't been tested for it.  Even if
> so, I'm sure I'm not the only one who hadn't thought of using it that
> way, so while it might have been development-tested for single-device-
> replace, it's unlikely to have had the same degree of broader testing of
> actual usage, simply because few even thought of using it that way.

*If* replace is usable for single-drive FS, this method has the
advantage that it can be cancelled in the middle and (for single
drive, using Noah's trick) even after the operation has finished. For
multi-drive FS, the trick wouldn't help as soon as any changes were
made the the FS after the replace.

-- 
משיח NOW!
Moshiach is coming very soon, prepare yourself!
יחי אדוננו מורינו ורבינו מלך המשיח לעולם ועד!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Linux 4.1.0: BTRFS: error (device sdb) in btrfs_run_delayed_refs:2821: errno=-17 Object already exists

2015-06-29 Thread Martin Tippmann
Hi, is this a know issue and is there a already a patch out there for
this? This is also happening with Linux 4.0.4. I can reproduce this
expect it's happening under load...

Mount options:
/dev/sdb /media/storage2 btrfs ro,noatime,compress=lzo,space_cache 0 0

# uname -r
4.1.0-040100-generic

# btrfs fi show /dev/sdb
Label: 'storage2'  uuid: 4b0fdfb4-8d9e-43e4-b51c-15446a792d9b
Total devices 1 FS bytes used 676.37GiB
devid1 size 3.64TiB used 829.06GiB path /dev/sdb

# btrfs fi df /media/storage2/
Data, single: total=824.00GiB, used=674.95GiB
System, DUP: total=32.00MiB, used=112.00KiB
Metadata, DUP: total=2.50GiB, used=1.42GiB
GlobalReserve, single: total=496.00MiB, used=0.00B

btrfs-progs v4.0



[23514.127519] [ cut here ]
[23514.127566] WARNING: CPU: 0 PID: 920 at
/home/kernel/COD/linux/fs/btrfs/super.c:260
__btrfs_abort_transaction+0x5f/0x140 [btrfs]()
[23514.127569] BTRFS: Transaction aborted (error -17)
[23514.127572] Modules linked in: x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel aesni_intel aes_x86_64 ipmi_ssif ipmi_devintf lrw
gf128mul glue_helper dcdbas ablk_helper cryptd joydev mei_me sb_edac
edac_core mac_hid 8250_fintek acpi_pad acpi_power_meter lpc_ich mei
shpchp ipmi_si ipmi_msghandler btrfs xor raid6_pq squashfs nfsv3
nfs_acl nfs lockd grace sunrpc fscache overlay hid_generic usbhid hid
ixgbe igb vxlan ahci ip6_udp_tunnel dca udp_tunnel libahci ptp
i2c_algo_bit megaraid_sas pps_core wmi mdio
[23514.127631] CPU: 0 PID: 920 Comm: btrfs-transacti Tainted: G
 C  4.1.0-040100-generic #201506220235
[23514.127634] Hardware name: Dell Inc. PowerEdge R720/0HJK12, BIOS
2.2.2 01/16/2014
[23514.127637]  0104 88081a8dfbf8 81800595
0007
[23514.127642]  88081a8dfc48 88081a8dfc38 8107d0e7
88081a8dfc38
[23514.127646]  8807c2ebbe40 88101b971000 ffef
0b05
[23514.127651] Call Trace:
[23514.127661]  [] dump_stack+0x45/0x57
[23514.127668]  [] warn_slowpath_common+0x97/0xe0
[23514.127673]  [] warn_slowpath_fmt+0x46/0x50
[23514.127696]  [] ?
__btrfs_run_delayed_refs+0x585/0x620 [btrfs]
[23514.127712]  []
__btrfs_abort_transaction+0x5f/0x140 [btrfs]
[23514.127732]  []
btrfs_run_delayed_refs.part.83+0x175/0x290 [btrfs]
[23514.127749]  [] ? btrfs_free_path+0x2a/0x40 [btrfs]
[23514.127767]  [] btrfs_run_delayed_refs+0x17/0x20 [btrfs]
[23514.127792]  [] commit_cowonly_roots+0x10d/0x2d8 [btrfs]
[23514.127811]  [] ?
btrfs_run_delayed_refs.part.83+0x230/0x290 [btrfs]
[23514.127834]  []
btrfs_commit_transaction+0x57d/0xb60 [btrfs]
[23514.127855]  [] transaction_kthread+0x1d5/0x250 [btrfs]
[23514.127875]  [] ? open_ctree+0x1900/0x1900 [btrfs]
[23514.127882]  [] kthread+0xc9/0xe0
[23514.127887]  [] ? flush_kthread_worker+0x90/0x90
[23514.127893]  [] ret_from_fork+0x42/0x70
[23514.127898]  [] ? flush_kthread_worker+0x90/0x90
[23514.127901] ---[ end trace a2b494fb2ce24c5e ]---
[23514.127906] BTRFS: error (device sdb) in
btrfs_run_delayed_refs:2821: errno=-17 Object already exists
[23514.127936] BTRFS info (device sdb): forced readonly
[23514.127940] BTRFS warning (device sdb): Skipping commit of aborted
transaction.
[23514.127943] BTRFS: error (device sdb) in cleanup_transaction:1692:
errno=-17 Object already exists


regards
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Linux 4.1.0: BTRFS: error (device sdb) in btrfs_run_delayed_refs:2821: errno=-17 Object already exists

2015-06-29 Thread Martin Tippmann
Hi, is this a know issue and is there a already a patch out there for
this? This is also happening with Linux 4.0.4. I can reproduce this
expect it's happening under load...

Mount options:
/dev/sdb /media/storage2 btrfs ro,noatime,compress=lzo,space_cache 0 0

# uname -r
4.1.0-040100-generic

# btrfs fi show /dev/sdb
Label: 'storage2'  uuid: 4b0fdfb4-8d9e-43e4-b51c-15446a792d9b
Total devices 1 FS bytes used 676.37GiB
devid1 size 3.64TiB used 829.06GiB path /dev/sdb

# btrfs fi df /media/storage2/
Data, single: total=824.00GiB, used=674.95GiB
System, DUP: total=32.00MiB, used=112.00KiB
Metadata, DUP: total=2.50GiB, used=1.42GiB
GlobalReserve, single: total=496.00MiB, used=0.00B

btrfs-progs v4.0



[23514.127519] [ cut here ]
[23514.127566] WARNING: CPU: 0 PID: 920 at
/home/kernel/COD/linux/fs/btrfs/super.c:260
__btrfs_abort_transaction+0x5f/0x140 [btrfs]()
[23514.127569] BTRFS: Transaction aborted (error -17)
[23514.127572] Modules linked in: x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel aesni_intel aes_x86_64 ipmi_ssif ipmi_devintf lrw
gf128mul glue_helper dcdbas ablk_helper cryptd joydev mei_me sb_edac
edac_core mac_hid 8250_fintek acpi_pad acpi_power_meter lpc_ich mei
shpchp ipmi_si ipmi_msghandler btrfs xor raid6_pq squashfs nfsv3
nfs_acl nfs lockd grace sunrpc fscache overlay hid_generic usbhid hid
ixgbe igb vxlan ahci ip6_udp_tunnel dca udp_tunnel libahci ptp
i2c_algo_bit megaraid_sas pps_core wmi mdio
[23514.127631] CPU: 0 PID: 920 Comm: btrfs-transacti Tainted: G
 C  4.1.0-040100-generic #201506220235
[23514.127634] Hardware name: Dell Inc. PowerEdge R720/0HJK12, BIOS
2.2.2 01/16/2014
[23514.127637]  0104 88081a8dfbf8 81800595
0007
[23514.127642]  88081a8dfc48 88081a8dfc38 8107d0e7
88081a8dfc38
[23514.127646]  8807c2ebbe40 88101b971000 ffef
0b05
[23514.127651] Call Trace:
[23514.127661]  [] dump_stack+0x45/0x57
[23514.127668]  [] warn_slowpath_common+0x97/0xe0
[23514.127673]  [] warn_slowpath_fmt+0x46/0x50
[23514.127696]  [] ?
__btrfs_run_delayed_refs+0x585/0x620 [btrfs]
[23514.127712]  []
__btrfs_abort_transaction+0x5f/0x140 [btrfs]
[23514.127732]  []
btrfs_run_delayed_refs.part.83+0x175/0x290 [btrfs]
[23514.127749]  [] ? btrfs_free_path+0x2a/0x40 [btrfs]
[23514.127767]  [] btrfs_run_delayed_refs+0x17/0x20 [btrfs]
[23514.127792]  [] commit_cowonly_roots+0x10d/0x2d8 [btrfs]
[23514.127811]  [] ?
btrfs_run_delayed_refs.part.83+0x230/0x290 [btrfs]
[23514.127834]  []
btrfs_commit_transaction+0x57d/0xb60 [btrfs]
[23514.127855]  [] transaction_kthread+0x1d5/0x250 [btrfs]
[23514.127875]  [] ? open_ctree+0x1900/0x1900 [btrfs]
[23514.127882]  [] kthread+0xc9/0xe0
[23514.127887]  [] ? flush_kthread_worker+0x90/0x90
[23514.127893]  [] ret_from_fork+0x42/0x70
[23514.127898]  [] ? flush_kthread_worker+0x90/0x90
[23514.127901] ---[ end trace a2b494fb2ce24c5e ]---
[23514.127906] BTRFS: error (device sdb) in
btrfs_run_delayed_refs:2821: errno=-17 Object already exists
[23514.127936] BTRFS info (device sdb): forced readonly
[23514.127940] BTRFS warning (device sdb): Skipping commit of aborted
transaction.
[23514.127943] BTRFS: error (device sdb) in cleanup_transaction:1692:
errno=-17 Object already exists


regards
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: add feature to get mininum size for resizing a fs/device

2015-06-29 Thread David Sterba
On Wed, Jun 17, 2015 at 12:44:55PM +0100, fdman...@kernel.org wrote:
> Currently there is not way for a user to know what is the minimum size a
> device of a btrfs filesystem can be resized to. Sometimes the value of
> total allocated space (sum of all allocated chunks/device extents), which
> can be parsed from 'btrfs filesystem show' and 'btrfs filesystem usage',
> works as the minimum size, but sometimes it does not, namely when device
> extents have to relocated to holes (unallocated space) within the new
> size of the device (the total allocated space sum).
> 
> This change adds the ability to reliably compute such minimum value and
> extents 'btrfs filesystem resize' with the following syntax to get such
> value:

The test fails after I do this before unmount:

$SUDO_HELPER $TOP/btrfs balance start -mconvert=single -sconvert=single -f 
$TEST_MNT
shrink_test

Output:

### root_helper ../btrfs filesystem resize get_min_size ../tests/mnt
6480199680 bytes (6.04GiB)
min size = 6480199680
### root_helper ../btrfs filesystem resize 6480199680 ../tests/mnt
ERROR: unable to resize '../tests/mnt' - No space left on device
Resize '../tests/mnt' of '6480199680'

Last successful resize before this was:
Resize '../tests/mnt' of '7553941504'

>btrfs filesystem resize [devid:]get_min_size

I don't think this is the right interface, IMHO this fits into the
inspect-internal group. The syntax for 'fi resize' is a bit cumbersome, I'd
like to avoid complicating it further. But you can keep it as-is until the the
bugs are fixed.

> --- a/Makefile.in
> +++ b/Makefile.in
> @@ -46,7 +46,7 @@ libbtrfs_objects = send-stream.o send-utils.o rbtree.o 
> btrfs-list.o crc32c.o \
>  libbtrfs_headers = send-stream.h send-utils.h send.h rbtree.h btrfs-list.h \
>  crc32c.h list.h kerncompat.h radix-tree.h extent-cache.h \
>  extent_io.h ioctl.h ctree.h btrfsck.h version.h
> -TESTS = fsck-tests.sh convert-tests.sh
> +TESTS = fsck-tests.sh convert-tests.sh shrink-min-size-tests.sh
>  
>  prefix ?= @prefix@
>  exec_prefix = @exec_prefix@
> @@ -161,6 +161,10 @@ $(BUILDDIRS):
>   @echo "Making all in $(patsubst build-%,%,$@)"
>   $(Q)$(MAKE) $(MAKEOPTS) -C $(patsubst build-%,%,$@)
>  
> +test-shrink-min-size: btrfs mkfs.btrfs
> + @echo "[TEST]   shrink-min-size-tests.sh"
> + $(Q)bash tests/shrink-min-size-tests.sh

Please move the test under the test-misc, for now it's a catch-all category.

General comments to the test script(s):
* the tests are supposed to be run unprivileged as much as possible, so the
  SUDO_HELPER needs to be used explicitly
* use the git binaries everywhere

(diff below)

> +shrink_test()
> +{
> + min_size=$(btrfs filesystem resize get_min_size $TEST_MNT)

sudo helper, use binary from git

> + if [ $? != 0 ]; then
> + _fail "Failed to get minimum size"
> + fi
> + min_size=$(echo $min_size | cut -d ' ' -f 1)
> + echo "min size = ${min_size}" >> $RESULTS
> + run_check btrfs filesystem resize $min_size $TEST_MNT

sudo helper, use binary from git

> +}
> +
> +run_check truncate -s 20G $IMAGE
> +run_check $SUDO_HELPER $TOP/mkfs.btrfs -f $IMAGE

sudo helper not necessary, it's a file-backed image

> +run_check $SUDO_HELPER mount $IMAGE $TEST_MNT
> +
> +# Create 7 data block groups, each with a size of 1Gb.
> +for ((i = 1; i <= 7; i++)); do
> + run_check fallocate -l 1G $TEST_MNT/foo$i

This failed as the mountpoint is created by root and I'm running the
test under my user. A chown after mount fixed it.

> +done
[...]

@@ -23,19 +23,23 @@ setup_root_helper

 shrink_test()
 {
-   min_size=$(btrfs filesystem resize get_min_size $TEST_MNT)
+   min_size=$(run_check_stdout $SUDO_HELPER $TOP/btrfs filesystem resize 
get_min_size $TEST_MNT)
if [ $? != 0 ]; then
_fail "Failed to get minimum size"
fi
min_size=$(echo $min_size | cut -d ' ' -f 1)
echo "min size = ${min_size}" >> $RESULTS
-   run_check btrfs filesystem resize $min_size $TEST_MNT
+   run_check $SUDO_HELPER $TOP/btrfs filesystem resize $min_size $TEST_MNT
 }

 run_check truncate -s 20G $IMAGE
-run_check $SUDO_HELPER $TOP/mkfs.btrfs -f $IMAGE
+run_check $TOP/mkfs.btrfs -f $IMAGE
 run_check $SUDO_HELPER mount $IMAGE $TEST_MNT

+user=$(id -un)
+group=$(id -gn)
+run_check $SUDO_HELPER chown $user:$group $TEST_MNT
+
 # Create 7 data block groups, each with a size of 1Gb.
 for ((i = 1; i <= 7; i++)); do
run_check fallocate -l 1G $TEST_MNT/foo$i
@@ -43,7 +47,7 @@ done

 # Make sure they are persisted (all the chunk, device and block group items
 # added to the chunk/dev/extent trees).
-run_check btrfs filesystem sync $TEST_MNT
+run_check $TOP/btrfs filesystem sync $TEST_MNT

 # Now remove 3 of those 1G files. This will result in 3 block groups becoming
 # unused, which will be automatically deleted by the cleaner kthread, and this
@@ -58,9 +62,9 @@ run_check rm -f $TEST_MNT/foo6
 # groups - it coul

Re: [PATCH] Btrfs-progs: add feature to get mininum size for resizing a fs/device

2015-06-29 Thread Filipe Manana
On Mon, Jun 29, 2015 at 4:42 PM, David Sterba  wrote:
> On Wed, Jun 17, 2015 at 12:44:55PM +0100, fdman...@kernel.org wrote:
>> Currently there is not way for a user to know what is the minimum size a
>> device of a btrfs filesystem can be resized to. Sometimes the value of
>> total allocated space (sum of all allocated chunks/device extents), which
>> can be parsed from 'btrfs filesystem show' and 'btrfs filesystem usage',
>> works as the minimum size, but sometimes it does not, namely when device
>> extents have to relocated to holes (unallocated space) within the new
>> size of the device (the total allocated space sum).
>>
>> This change adds the ability to reliably compute such minimum value and
>> extents 'btrfs filesystem resize' with the following syntax to get such
>> value:
>
> The test fails after I do this before unmount:
>
> $SUDO_HELPER $TOP/btrfs balance start -mconvert=single -sconvert=single -f 
> $TEST_MNT
> shrink_test

Where are you doing this exactly?

Just tried the following:  https://friendpaste.com/2U7C4gBBLBjo4e2v1ZnJP2

>
> Output:
>
> ### root_helper ../btrfs filesystem resize get_min_size 
> ../tests/mnt
> 6480199680 bytes (6.04GiB)
> min size = 6480199680
> ### root_helper ../btrfs filesystem resize 6480199680 ../tests/mnt
> ERROR: unable to resize '../tests/mnt' - No space left on device
> Resize '../tests/mnt' of '6480199680'
>
> Last successful resize before this was:
> Resize '../tests/mnt' of '7553941504'

And it didn't fail for me on a 4.1 kernel at least. It produced those
2 sizes as well, but it didn't fail for any of them.

thanks

>
>>btrfs filesystem resize [devid:]get_min_size
>
> I don't think this is the right interface, IMHO this fits into the
> inspect-internal group. The syntax for 'fi resize' is a bit cumbersome, I'd
> like to avoid complicating it further. But you can keep it as-is until the the
> bugs are fixed.
>
>> --- a/Makefile.in
>> +++ b/Makefile.in
>> @@ -46,7 +46,7 @@ libbtrfs_objects = send-stream.o send-utils.o rbtree.o 
>> btrfs-list.o crc32c.o \
>>  libbtrfs_headers = send-stream.h send-utils.h send.h rbtree.h btrfs-list.h \
>>  crc32c.h list.h kerncompat.h radix-tree.h extent-cache.h \
>>  extent_io.h ioctl.h ctree.h btrfsck.h version.h
>> -TESTS = fsck-tests.sh convert-tests.sh
>> +TESTS = fsck-tests.sh convert-tests.sh shrink-min-size-tests.sh
>>
>>  prefix ?= @prefix@
>>  exec_prefix = @exec_prefix@
>> @@ -161,6 +161,10 @@ $(BUILDDIRS):
>>   @echo "Making all in $(patsubst build-%,%,$@)"
>>   $(Q)$(MAKE) $(MAKEOPTS) -C $(patsubst build-%,%,$@)
>>
>> +test-shrink-min-size: btrfs mkfs.btrfs
>> + @echo "[TEST]   shrink-min-size-tests.sh"
>> + $(Q)bash tests/shrink-min-size-tests.sh
>
> Please move the test under the test-misc, for now it's a catch-all category.
>
> General comments to the test script(s):
> * the tests are supposed to be run unprivileged as much as possible, so the
>   SUDO_HELPER needs to be used explicitly
> * use the git binaries everywhere
>
> (diff below)
>
>> +shrink_test()
>> +{
>> + min_size=$(btrfs filesystem resize get_min_size $TEST_MNT)
>
> sudo helper, use binary from git
>
>> + if [ $? != 0 ]; then
>> + _fail "Failed to get minimum size"
>> + fi
>> + min_size=$(echo $min_size | cut -d ' ' -f 1)
>> + echo "min size = ${min_size}" >> $RESULTS
>> + run_check btrfs filesystem resize $min_size $TEST_MNT
>
> sudo helper, use binary from git
>
>> +}
>> +
>> +run_check truncate -s 20G $IMAGE
>> +run_check $SUDO_HELPER $TOP/mkfs.btrfs -f $IMAGE
>
> sudo helper not necessary, it's a file-backed image
>
>> +run_check $SUDO_HELPER mount $IMAGE $TEST_MNT
>> +
>> +# Create 7 data block groups, each with a size of 1Gb.
>> +for ((i = 1; i <= 7; i++)); do
>> + run_check fallocate -l 1G $TEST_MNT/foo$i
>
> This failed as the mountpoint is created by root and I'm running the
> test under my user. A chown after mount fixed it.
>
>> +done
> [...]
>
> @@ -23,19 +23,23 @@ setup_root_helper
>
>  shrink_test()
>  {
> -   min_size=$(btrfs filesystem resize get_min_size $TEST_MNT)
> +   min_size=$(run_check_stdout $SUDO_HELPER $TOP/btrfs filesystem resize 
> get_min_size $TEST_MNT)
> if [ $? != 0 ]; then
> _fail "Failed to get minimum size"
> fi
> min_size=$(echo $min_size | cut -d ' ' -f 1)
> echo "min size = ${min_size}" >> $RESULTS
> -   run_check btrfs filesystem resize $min_size $TEST_MNT
> +   run_check $SUDO_HELPER $TOP/btrfs filesystem resize $min_size 
> $TEST_MNT
>  }
>
>  run_check truncate -s 20G $IMAGE
> -run_check $SUDO_HELPER $TOP/mkfs.btrfs -f $IMAGE
> +run_check $TOP/mkfs.btrfs -f $IMAGE
>  run_check $SUDO_HELPER mount $IMAGE $TEST_MNT
>
> +user=$(id -un)
> +group=$(id -gn)
> +run_check $SUDO_HELPER chown $user:$group $TEST_MNT
> +
>  # Create 7 data block groups, each with a size of 1Gb.
>  for ((i = 1; i <= 7; i++)); do
> run_check fallocate 

Re: Linux 4.1.0: BTRFS: error (device sdb) in btrfs_run_delayed_refs:2821: errno=-17 Object already exists

2015-06-29 Thread Martin Tippmann
Some more data:

After unmounting the read-only mounted partition, this notice appears
in the kernel log:

VFS: Busy inodes after unmount of sdb. Self-destruct in 5 seconds.
Have a nice day...


Here is a the btrfs check output.. btrfs check --repair seems to fix the issues:

# btrfs check /dev/sdb
Checking filesystem on /dev/sdb
UUID: 4b0fdfb4-8d9e-43e4-b51c-15446a792d9b
checking extents
checking free space cache
block group 414527258624 has wrong amount of free spacefailed to load
free space cache for block group 414527258624
checking fs roots
root 5 inode 444968 errors 400, nbytes wrong
root 5 inode 446183 errors 400, nbytes wrong
root 5 inode 455719 errors 400, nbytes wrong
root 5 inode 47 errors 400, nbytes wrong
root 5 inode 468041 errors 400, nbytes wrong
root 5 inode 489381 errors 400, nbytes wrong
root 5 inode 512320 errors 400, nbytes wrong
root 5 inode 521936 errors 400, nbytes wrong
root 5 inode 522017 errors 400, nbytes wrong
root 5 inode 527486 errors 400, nbytes wrong
root 5 inode 528830 errors 400, nbytes wrong
root 5 inode 548638 errors 400, nbytes wrong
root 5 inode 555329 errors 400, nbytes wrong
root 5 inode 571448 errors 400, nbytes wrong
root 5 inode 573600 errors 400, nbytes wrong
root 5 inode 589105 errors 400, nbytes wrong
root 5 inode 590616 errors 400, nbytes wrong
root 5 inode 595230 errors 400, nbytes wrong
root 5 inode 610324 errors 400, nbytes wrong
root 5 inode 611844 errors 400, nbytes wrong
root 5 inode 611972 errors 400, nbytes wrong
root 5 inode 612313 errors 400, nbytes wrong
root 5 inode 613156 errors 400, nbytes wrong
root 5 inode 614017 errors 400, nbytes wrong
root 5 inode 616045 errors 400, nbytes wrong
root 5 inode 619160 errors 400, nbytes wrong
root 5 inode 626453 errors 400, nbytes wrong
root 5 inode 626648 errors 400, nbytes wrong
root 5 inode 629338 errors 400, nbytes wrong
root 5 inode 633583 errors 400, nbytes wrong
root 5 inode 638616 errors 400, nbytes wrong
root 5 inode 639023 errors 400, nbytes wrong
root 5 inode 640373 errors 400, nbytes wrong
root 5 inode 641524 errors 400, nbytes wrong
root 5 inode 641586 errors 400, nbytes wrong
root 5 inode 645090 errors 400, nbytes wrong
root 5 inode 655241 errors 400, nbytes wrong
root 5 inode 674139 errors 400, nbytes wrong
found 726034010941 bytes used err is 1
total csum bytes: 707524756
total tree bytes: 1528659968
total fs tree bytes: 453296128
total extent tree bytes: 243171328
btree space waste bytes: 253506582
file data blocks allocated: 724722012160
 referenced 848544292864
btrfs-progs v4.0


regards
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] btrfs: don't update mtime on deduped inodes

2015-06-29 Thread Mark Fasheh
On Sat, Jun 27, 2015 at 05:44:28PM -0400, Zygo Blaxell wrote:
> On Fri, Jun 26, 2015 at 02:01:01PM -0700, Mark Fasheh wrote:
> > One issue users have reported is that dedupe changes mtime on files,
> > resulting in tools like rsync thinking that their contents have changed when
> > in fact the data is exactly the same. Clone still wants an mtime change, so
> > we special case this in the code.
> > 
> > This was tested with the btrfs-extent-same tool.
> > 
> > Signed-off-by: Mark Fasheh 
> > ---
> >  fs/btrfs/ioctl.c | 25 +++--
> >  1 file changed, 15 insertions(+), 10 deletions(-)
> > 
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index 83f4679..0af0f13 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -87,7 +87,8 @@ struct btrfs_ioctl_received_subvol_args_32 {
> >  
> >  
> >  static int btrfs_clone(struct inode *src, struct inode *inode,
> > -  u64 off, u64 olen, u64 olen_aligned, u64 destoff);
> > +  u64 off, u64 olen, u64 olen_aligned, u64 destoff,
> > +  int no_mtime);
> >  
> >  /* Mask out flags that are inappropriate for the given type of inode. */
> >  static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags)
> > @@ -3054,7 +3055,7 @@ static int btrfs_extent_same(struct inode *src, u64 
> > loff, u64 olen,
> > /* pass original length for comparison so we stay within i_size */
> > ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
> > if (ret == 0)
> > -   ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
> > +   ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
> >  
> > if (same_inode)
> > unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
> > @@ -3219,13 +3220,17 @@ static int clone_finish_inode_update(struct 
> > btrfs_trans_handle *trans,
> >  struct inode *inode,
> >  u64 endoff,
> >  const u64 destoff,
> > -const u64 olen)
> > +const u64 olen,
> > +int no_mtime)
> >  {
> > struct btrfs_root *root = BTRFS_I(inode)->root;
> > int ret;
> >  
> > inode_inc_iversion(inode);
> > -   inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> > +   if (no_mtime)
> > +   inode->i_ctime = CURRENT_TIME;
> 
> I don't see a good reason to modify the ctime either.  Again, nothing
> is changing here.  All we are doing is shuffling physical storage around.
> 
> Defrag and balance (which also move physical extents around) don't
> touch ctime, mtime, or even atime.

To be fair, those may actually be oversights, it's not uncommon to update
ctime on metadata changes.

Does a ctime change hurt any backup software (the reason for my first
patch)? I guess it could cause revaluation of meta data, but does that
actually happen? From what I can tell stuff like rsync is using mtime +
i_size to see if an inode changed.

Is there any software out there that monitors an inodes extent state which
might *want* ctime updates when this happens? Is that kind of usage a
stretch (or even something we care about?).

So my thinking is if it doesn't hurt anything, leave it in. Obviously if it
*is* causing issues then we should take it right out :)

Thanks for the discussion and review btw,
--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] btrfs: don't update mtime on deduped inodes

2015-06-29 Thread Zygo Blaxell
On Mon, Jun 29, 2015 at 10:52:41AM -0700, Mark Fasheh wrote:
> On Sat, Jun 27, 2015 at 05:44:28PM -0400, Zygo Blaxell wrote:
> > On Fri, Jun 26, 2015 at 02:01:01PM -0700, Mark Fasheh wrote:
> > > One issue users have reported is that dedupe changes mtime on files,
> > > resulting in tools like rsync thinking that their contents have changed 
> > > when
> > > in fact the data is exactly the same. Clone still wants an mtime change, 
> > > so
> > > we special case this in the code.
> > > 
> > > This was tested with the btrfs-extent-same tool.
> > > 
> > > Signed-off-by: Mark Fasheh 
> > > ---
> > >  fs/btrfs/ioctl.c | 25 +++--
> > >  1 file changed, 15 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > > index 83f4679..0af0f13 100644
> > > --- a/fs/btrfs/ioctl.c
> > > +++ b/fs/btrfs/ioctl.c
> > > @@ -87,7 +87,8 @@ struct btrfs_ioctl_received_subvol_args_32 {
> > >  
> > >  
> > >  static int btrfs_clone(struct inode *src, struct inode *inode,
> > > -u64 off, u64 olen, u64 olen_aligned, u64 destoff);
> > > +u64 off, u64 olen, u64 olen_aligned, u64 destoff,
> > > +int no_mtime);
> > >  
> > >  /* Mask out flags that are inappropriate for the given type of inode. */
> > >  static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags)
> > > @@ -3054,7 +3055,7 @@ static int btrfs_extent_same(struct inode *src, u64 
> > > loff, u64 olen,
> > >   /* pass original length for comparison so we stay within i_size */
> > >   ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
> > >   if (ret == 0)
> > > - ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
> > > + ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
> > >  
> > >   if (same_inode)
> > >   unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
> > > @@ -3219,13 +3220,17 @@ static int clone_finish_inode_update(struct 
> > > btrfs_trans_handle *trans,
> > >struct inode *inode,
> > >u64 endoff,
> > >const u64 destoff,
> > > -  const u64 olen)
> > > +  const u64 olen,
> > > +  int no_mtime)
> > >  {
> > >   struct btrfs_root *root = BTRFS_I(inode)->root;
> > >   int ret;
> > >  
> > >   inode_inc_iversion(inode);
> > > - inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> > > + if (no_mtime)
> > > + inode->i_ctime = CURRENT_TIME;
> > 
> > I don't see a good reason to modify the ctime either.  Again, nothing
> > is changing here.  All we are doing is shuffling physical storage around.
> > 
> > Defrag and balance (which also move physical extents around) don't
> > touch ctime, mtime, or even atime.
> 
> To be fair, those may actually be oversights, it's not uncommon to update
> ctime on metadata changes.

It makes no sense semantically.  There are no changes to any inode
fields here.  Normally when extents are moved around inodes don't change
at all.

The current balance behavior is definitely not an oversight.  Balance
would have to rewrite every inode on the filesystem (actually every
*snapshot* of every inode) to update ctime.

Defrag is copy-in-place while holding a lock to prevent concurrent
modifications.  Defrag does the complementary operation to extent-same.
Defrag will also not change any file contents, and it's already got the
correct ctime behavior.  ;)

> Does a ctime change hurt any backup software (the reason for my first
> patch)? I guess it could cause revaluation of meta data, but does that
> actually happen? From what I can tell stuff like rsync is using mtime +
> i_size to see if an inode changed.

Off the top of my head, git uses ctime to figure out whether its index is
up to date.  It probably borrowed that from other SCMs.

Some admins (myself included) build ad-hoc (and even formal) forensic
timelines from ctime data.  This doesn't work if dedup wipes out the
ctime, especially if you are doing aggressive dedup across multiple
snapshots.

The core problem is that deduping stops being a transparent rearrangement
of the physical layout of the data (like defrag or balance) if the ctime
changes whenever you do it.

> Is there any software out there that monitors an inodes extent state which
> might *want* ctime updates when this happens? Is that kind of usage a
> stretch (or even something we care about?).
> 
> So my thinking is if it doesn't hurt anything, leave it in. Obviously if it
> *is* causing issues then we should take it right out :)
> 
> Thanks for the discussion and review btw,
>   --Mark
> 
> --
> Mark Fasheh
> 


signature.asc
Description: Digital signature


Re: [PATCH 5/5] btrfs: don't update mtime on deduped inodes

2015-06-29 Thread Mark Fasheh
On Mon, Jun 29, 2015 at 03:35:02PM -0400, Zygo Blaxell wrote:
> On Mon, Jun 29, 2015 at 10:52:41AM -0700, Mark Fasheh wrote:
> > On Sat, Jun 27, 2015 at 05:44:28PM -0400, Zygo Blaxell wrote:
> > > On Fri, Jun 26, 2015 at 02:01:01PM -0700, Mark Fasheh wrote:
> > > > One issue users have reported is that dedupe changes mtime on files,
> > > > resulting in tools like rsync thinking that their contents have changed 
> > > > when
> > > > in fact the data is exactly the same. Clone still wants an mtime 
> > > > change, so
> > > > we special case this in the code.
> > > > 
> > > > This was tested with the btrfs-extent-same tool.
> > > > 
> > > > Signed-off-by: Mark Fasheh 
> > > > ---
> > > >  fs/btrfs/ioctl.c | 25 +++--
> > > >  1 file changed, 15 insertions(+), 10 deletions(-)
> > > > 
> > > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > > > index 83f4679..0af0f13 100644
> > > > --- a/fs/btrfs/ioctl.c
> > > > +++ b/fs/btrfs/ioctl.c
> > > > @@ -87,7 +87,8 @@ struct btrfs_ioctl_received_subvol_args_32 {
> > > >  
> > > >  
> > > >  static int btrfs_clone(struct inode *src, struct inode *inode,
> > > > -  u64 off, u64 olen, u64 olen_aligned, u64 
> > > > destoff);
> > > > +  u64 off, u64 olen, u64 olen_aligned, u64 destoff,
> > > > +  int no_mtime);
> > > >  
> > > >  /* Mask out flags that are inappropriate for the given type of inode. 
> > > > */
> > > >  static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags)
> > > > @@ -3054,7 +3055,7 @@ static int btrfs_extent_same(struct inode *src, 
> > > > u64 loff, u64 olen,
> > > > /* pass original length for comparison so we stay within i_size 
> > > > */
> > > > ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
> > > > if (ret == 0)
> > > > -   ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
> > > > +   ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 
> > > > 1);
> > > >  
> > > > if (same_inode)
> > > > unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
> > > > @@ -3219,13 +3220,17 @@ static int clone_finish_inode_update(struct 
> > > > btrfs_trans_handle *trans,
> > > >  struct inode *inode,
> > > >  u64 endoff,
> > > >  const u64 destoff,
> > > > -const u64 olen)
> > > > +const u64 olen,
> > > > +int no_mtime)
> > > >  {
> > > > struct btrfs_root *root = BTRFS_I(inode)->root;
> > > > int ret;
> > > >  
> > > > inode_inc_iversion(inode);
> > > > -   inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> > > > +   if (no_mtime)
> > > > +   inode->i_ctime = CURRENT_TIME;
> > > 
> > > I don't see a good reason to modify the ctime either.  Again, nothing
> > > is changing here.  All we are doing is shuffling physical storage around.
> > > 
> > > Defrag and balance (which also move physical extents around) don't
> > > touch ctime, mtime, or even atime.
> > 
> > To be fair, those may actually be oversights, it's not uncommon to update
> > ctime on metadata changes.
> 
> It makes no sense semantically.  There are no changes to any inode
> fields here.  Normally when extents are moved around inodes don't change
> at all.
> 
> The current balance behavior is definitely not an oversight.  Balance
> would have to rewrite every inode on the filesystem (actually every
> *snapshot* of every inode) to update ctime.
> 
> Defrag is copy-in-place while holding a lock to prevent concurrent
> modifications.  Defrag does the complementary operation to extent-same.
> Defrag will also not change any file contents, and it's already got the
> correct ctime behavior.  ;)

Ahh ok sure that makes sense :)


> > Does a ctime change hurt any backup software (the reason for my first
> > patch)? I guess it could cause revaluation of meta data, but does that
> > actually happen? From what I can tell stuff like rsync is using mtime +
> > i_size to see if an inode changed.
> 
> Off the top of my head, git uses ctime to figure out whether its index is
> up to date.  It probably borrowed that from other SCMs.
> 
> Some admins (myself included) build ad-hoc (and even formal) forensic
> timelines from ctime data.  This doesn't work if dedup wipes out the
> ctime, especially if you are doing aggressive dedup across multiple
> snapshots.
> 
> The core problem is that deduping stops being a transparent rearrangement
> of the physical layout of the data (like defrag or balance) if the ctime
> changes whenever you do it.

Ok, thanks for describing those use cases. It makes sense now to me why we
would want to avoid the ctime changes. I should have another patch out
shortly.

Thanks again,
--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line 

[GIT PULL] Btrfs

2015-06-29 Thread Chris Mason
Hi Linus,

Please pull my for-linus-4.2 branch:

git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git 
for-linus-4.2

Outside of our usual batch of fixes, this integrates the subvolume quota
updates that Qu Wenruo from Fujitsu has been working on for a few
releases now.  He gets an extra gold star for making btrfs smaller this
time, and fixing a number of quota corners in the process.

Dave Sterba tested and integrated Anand Jain's sysfs improvements.
Outside of exporting a symbol (ack'd by Greg) these are all internal to
btrfs and it's mostly cleanups and fixes.  Anand also attached some of
our sysfs objects to our internal device management structs instead of
an object off the super block.  It will make device management easier
overall and it's a better fit for how the sysfs files are used.  None of
the existing sysfs files are moved around.

Thanks for all the fixes everyone:

Anand Jain (28) commits (+304/-115):
Btrfs: sysfs: move super_kobj and device_dir_kobj from fs_info to 
btrfs_fs_devices (+56/-43)
Btrfs: sysfs: fix, btrfs_release_super_kobj() should to clean up the 
kobject data (+2/-0)
Btrfs: sysfs: introduce function btrfs_sysfs_add_fsid() to create sysfs 
fsid (+14/-1)
Btrfs: sysfs: fix, fs_info kobject_unregister has init_completion() twice 
(+0/-1)
Btrfs: sysfs btrfs_kobj_rm_device() pass fs_devices instead of fs_info 
(+10/-10)
Btrfs: sysfs: rename __btrfs_sysfs_remove_one to btrfs_sysfs_remove_fsid 
(+4/-4)
Btrfs: sysfs: fix, kobject pointer clean up needed after kobject release 
(+1/-0)
Btrfs: sysfs btrfs_kobj_add_device() pass fs_devices instead of fs_info 
(+6/-7)
Btrfs: sysfs: don't fail seeding for the sake of sysfs kobject issue (+1/-1)
Btrfc: sysfs: fix, check if device_dir_kobj is init before destroy (+6/-4)
Btrfs: sysfs: provide framework to remove all fsid sysfs kobject (+16/-1)
Btrfs: sysfs: separate device kobject and its attribute creation (+15/-6)
Btrfs: sysfs: add support to show replacing target in the sysfs (+7/-1)
Btrfs: check error before reporting missing device and add uuid (+2/-1)
Btrfs: sysfs: add pointer to access fs_info from fs_devices (+25/-0)
Btrfs: sysfs: btrfs_sysfs_remove_fsid() make it non static (+2/-1)
Btrfs: sysfs: let default_attrs be separate from the kset (+8/-4)
Btrfs: sysfs: separate kobject and attribute creation (+19/-14)
Btrfs: sysfs: make btrfs_sysfs_add_device() non static (+1/-0)
Btrfs: sysfs: make btrfs_sysfs_add_fsid() non static (+3/-1)
Btrfs: introduce btrfs_get_fs_uuids to get fs_uuids (+5/-0)
Btrfs: Check if kobject is initialized before put (+5/-3)
Btrfs: sysfs: add support to add parent for fsid (+2/-2)
Btrfs: sysfs: reorder the kobject creations (+13/-10)
Btrfs: sysfs: fix, undo sysfs device links (+17/-0)
Btrfs: log when missing device is created (+2/-0)
lib: export symbol kobject_move() (+1/-0)
Btrfs: free the stale device (+61/-0)

Qu Wenruo (19) commits (+879/-1542):
btrfs: extent-tree: Use ref_node to replace unneeded parameters in 
__inc_extent_ref() and __free_extent() (+21/-21)
btrfs: qgroup: Make snapshot accounting work with new extent-oriented 
(+33/-20)
btrfs: qgroup: Add the ability to skip given qgroup for old/new_roots. 
(+40/-0)
btrfs: qgroup: Switch self test to extent-oriented qgroup mechanism. 
(+89/-27)
btrfs: delayed-ref: Use list to replace the ref_root in ref_head. 
(+114/-123)
btrfs: qgroup: Cleanup open-coded old/new_refcnt update and read. (+54/-41)
btrfs: qgroup: Switch to new extent-oriented qgroup mechanism. (+28/-100)
btrfs: qgroup: Record possible quota-related extent for qgroup. (+95/-7)
btrfs: backref: Don't merge refs which are not for same block. (+3/-3)
btrfs: qgroup: Cleanup the old ref_node-oriented mechanism. (+3/-972)
btrfs: backref: Add special time_seq == (u64)-1 case for (+29/-6)
btrfs: qgroup: Add function qgroup_update_counters(). (+120/-0)
btrfs: qgroup: Add new function to record old_roots. (+29/-0)
btrfs: delayed-ref: Cleanup the unneeded functions. (+0/-174)
btrfs: qgroup: Add new qgroup calculation function (+118/-0)
btrfs: qgroup: Add function qgroup_update_refcnt(). (+58/-0)
btrfs: qgroup: Switch rescan to new mechanism. (+7/-36)
btrfs: ulist: Add ulist_del() function. (+37/-11)
btrfs: Fix superblock csum type check. (+1/-1)

Filipe Manana (14) commits (+340/-76):
Btrfs: incremental send, check if orphanized dir inode needs delayed rename 
(+37/-19)
Btrfs: fix necessary chunk tree space calculation when allocating a chunk 
(+7/-12)
Btrfs: wake up extent state waiters on unlock through clear_extent_bits 
(+6/-1)
Btrfs: incremental send, fix clone operations for compressed extents 
(+17/-1)
Btrfs: incremental send, don't delay directory renames unnecessarily 
(+46/-2)
Btrfs: fix chunk allocation regression leading to transaction abort (+19/-3)
Btrfs: fix mute

[GIT PULL] Small cleanup and fix from Fujitsu

2015-06-29 Thread Qu Wenruo

Hi Chris,

This is the small cleanup and fixes for 4.2. Compared to the previous 
pull, this one is quite small, only 2 cleanup and one small quota fix.


Yang Dongsheng (1):
  btrfs: qgroup: allow user to clear the limitation on qgroup

Zhao Lei (2):
  btrfs: cleanup noused initialization of dev in btrfs_end_bio()
  btrfs: add error handling for scrub_workers_get()

Please pull these small fixes from my tree
https://github.com/adam900710/linux.git for_chris_4.2_part2

Thanks,
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][RESEND] btrfs: fix search key advancing condition

2015-06-29 Thread Naohiro Aota
The search key advancing condition used in copy_to_sk() is loose. It can
advance the key even if it reaches sk->max_*: e.g. when the max key = (512,
1024, -1) and the current key = (512, 1025, 10), it increments the
offset by 1, continues hopeless search from (512, 1025, 11). This issue
make ioctl() to take unexpectedly long time scanning all the leaf a blocks
one by one.

This commit fix the problem using standard way of key comparison:
btrfs_comp_cpu_keys()

Signed-off-by: Naohiro Aota 
---
 fs/btrfs/ioctl.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 1c22c65..07dc01d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1932,6 +1932,7 @@ static noinline int copy_to_sk(struct btrfs_root *root,
u64 found_transid;
struct extent_buffer *leaf;
struct btrfs_ioctl_search_header sh;
+   struct btrfs_key test;
unsigned long item_off;
unsigned long item_len;
int nritems;
@@ -2015,12 +2016,17 @@ static noinline int copy_to_sk(struct btrfs_root *root,
}
 advance_key:
ret = 0;
-   if (key->offset < (u64)-1 && key->offset < sk->max_offset)
+   test.objectid = sk->max_objectid;
+   test.type = sk->max_type;
+   test.offset = sk->max_offset;
+   if (btrfs_comp_cpu_keys(key, &test) >= 0)
+   ret = 1;
+   else if (key->offset < (u64)-1)
key->offset++;
-   else if (key->type < (u8)-1 && key->type < sk->max_type) {
+   else if (key->type < (u8)-1) {
key->offset = 0;
key->type++;
-   } else if (key->objectid < (u64)-1 && key->objectid < sk->max_objectid) 
{
+   } else if (key->objectid < (u64)-1) {
key->offset = 0;
key->type = 0;
key->objectid++;
-- 
2.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] btrfs: Fix data checksum error cause by replace with io-load.

2015-06-29 Thread Qu Wenruo

To Chris:

Would you consider merging these patchset for late 4.2 merge window?
If it's OK to merge it into 4.2 late rc, we'll start our test and send 
pull request after our test, eta this Friday or next Monday.


I know normally we should submit it early especially when such fix is 
not small.
But the bug is long-standing and is quite annoying (possibility 
involved), also Zhao Lei has quite a good idea to cleanup the scrub 
codes based on the patchset.


So it would be quite nice if we have any chance to merge it into 4.2

Would it be OK for you?

Thanks,
Qu

Zhaolei wrote on 2015/05/29 19:55 +0800:

From: Zhao Lei 

xfstests btrfs/070 sometimes failed.
In my test machine, its fail rate is about 30%.
In another vm(vmware), its fail rate is about 50%.

Reason:
   btrfs/070 do replace and defrag with fsstress simultaneously,
   after above operation, checksum error is found by scrub.

   Actually, it have no relationship with defrag operation, only
   replace with fsstress can trigger this bug.

   New data writen to target device have possibility rewrited by
   old data from source device by replace code in debug, to avoid
   above problem, we can set target block group to readonly in
   replace period, so new data requested by other operation will
   not write to same place with replace code.

   Before patch(4.1-rc3):
 30% failed in 100 xfstests.
   After patch:
 0% failed in 300 xfstests.

Changelog v1->v2:
1: Update subject to reflect the problem being fixed.
2: Update description to say reason why set read-only can fix the
problem.
3: Use a helper function to avoid duplicated code block for set
chunk ro.
All of above are suggested by: David Sterba 

Reported-by: Qu Wenruo 
Suggested-by: Qu Wenruo 
Signed-off-by: Qu Wenruo 
Signed-off-by: Zhao Lei 
---
  fs/btrfs/scrub.c | 12 
  1 file changed, 12 insertions(+)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 8da3459..e1ebf43 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3455,6 +3455,18 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
if (!cache)
goto skip;

+   /*
+* we need call btrfs_inc_block_group_ro() with scrubs_paused,
+* to avoid deadlock caused by:
+* btrfs_inc_block_group_ro()
+* -> btrfs_wait_for_commit()
+* -> btrfs_commit_transaction()
+* -> btrfs_scrub_pause()
+*/
+   scrub_pause_on(fs_info);
+   btrfs_inc_block_group_ro(root, cache);
+   scrub_pause_off(fs_info);
+
dev_replace->cursor_right = found_key.offset + length;
dev_replace->cursor_left = found_key.offset;
dev_replace->item_needs_writeback = 1;


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] [btrfs] btrfs_rename(): don't ignore btrfs_end_transaction() return

2015-06-29 Thread Davide C. C. Italiano
From: Davide Italiano 

btrfs_end_transaction() can return an error -- this happens, e.g.
if it tries to commit and the transaction was aborted in the meanhwile.
Swallowing the error is wrong, so explicitly return it.

Signed-off-by: Davide Italiano 
---
 fs/btrfs/inode.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 59c475c..61b26be 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9199,7 +9199,8 @@ static int btrfs_rename(struct inode *old_dir, struct 
dentry *old_dentry,
btrfs_end_log_trans(root);
}
 out_fail:
-   btrfs_end_transaction(trans, root);
+   if (!ret)
+   ret = btrfs_end_transaction(trans, root);
 out_notrans:
if (old_ino == BTRFS_FIRST_FREE_OBJECTID)
up_read(&root->fs_info->subvol_sem);
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/11] reflink: test what happens when we hit resource limits

2015-06-29 Thread Darrick J. Wong
Signed-off-by: Darrick J. Wong 
---
 tests/generic/840 |  107 +
 tests/generic/840.out |0 
 tests/generic/841 |   87 
 tests/generic/841.out |5 ++
 tests/generic/group   |2 +
 5 files changed, 201 insertions(+)
 create mode 100755 tests/generic/840
 create mode 100644 tests/generic/840.out
 create mode 100755 tests/generic/841
 create mode 100644 tests/generic/841.out


diff --git a/tests/generic/840 b/tests/generic/840
new file mode 100755
index 000..498e3a2
--- /dev/null
+++ b/tests/generic/840
@@ -0,0 +1,107 @@
+#! /bin/bash
+# FS QA Test No. 840
+#
+# Try to hit the maximum reference count (eek!)
+#
+# This test runs extremely slowly, so it's not automatically run anywhere.
+#
+#---
+# Copyright (c) 2015, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+cd /
+rm -rf $tmp.* $TESTDIR1
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/attr
+
+# real QA test starts here
+_require_test
+_require_scratch
+_require_scratch_reflink
+_require_test_reflink
+_supported_os Linux
+
+_require_xfs_io_command "reflink"
+_require_xfs_io_command "fiemap"
+_require_cp_reflink
+
+rm -f $seqres.full
+
+echo "Format and mount"
+_scratch_mkfs > $seqres.full 2>&1
+_scratch_mount >> $seqres.full 2>&1
+
+TESTDIR=$SCRATCH_MNT/test-$seq
+rm -rf $TESTDIR
+mkdir $TESTDIR
+
+# Well let's hope the maximum reflink count is (less than (ha!)) 2^32...
+
+echo "Create a one block file"
+BLKSZ="$(stat -f $TESTDIR -c '%S')"
+$XFS_IO_PROG -f -c "pwrite -S 0x61 0 $BLKSZ" $TESTDIR/file1 >> $seqres.full
+$XFS_IO_PROG -f -c "pwrite -S 0x62 0 $BLKSZ" $TESTDIR/file2 >> $seqres.full
+
+$XFS_IO_PROG -f -c "reflink $TESTDIR/file1 0 0 $BLKSZ" $TESTDIR/file2 >> 
$seqres.full
+
+nr=32
+fnr=18
+for i in $(seq 0 $fnr); do
+   echo " ++ Reflink size $i, $(( (2 ** i) * BLKSZ)) bytes" | tee -a 
$seqres.full
+   $XFS_IO_PROG -f -c "reflink $TESTDIR/file1 0 $(( (2 ** i) * BLKSZ)) $(( 
(2 ** i) * BLKSZ ))" $TESTDIR/file1 >> $seqres.full || break
+done
+
+nrf=$((nr - fnr))
+echo "Clone $((2 ** nrf)) files"
+for i in $(seq 0 $((2 ** nrf)) ); do
+   cp --reflink=always $TESTDIR/file1 $TESTDIR/file1-$i
+done
+
+echo "Check scratch fs"
+umount $SCRATCH_MNT
+$XFS_DB_PROG -c 'agf 0' -c 'addr rlroot' -c p $SCRATCH_DEV
+_check_scratch_fs
+
+echo "Remove big file and recheck"
+_scratch_mount >> $seqres.full 2>&1
+#rm -rf $TESTDIR/file1
+umount $SCRATCH_MNT
+_check_scratch_fs
+
+echo "Remove all files and recheck"
+_scratch_mount >> $seqres.full 2>&1
+#rm -rf $TESTDIR/file2
+umount $SCRATCH_MNT
+_check_scratch_fs
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/840.out b/tests/generic/840.out
new file mode 100644
index 000..e69de29
diff --git a/tests/generic/841 b/tests/generic/841
new file mode 100755
index 000..3bf025c
--- /dev/null
+++ b/tests/generic/841
@@ -0,0 +1,87 @@
+#! /bin/bash
+# FS QA Test No. 841
+#
+# Try to run out of space while cloning?
+#
+#---
+# Copyright (c) 2015, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`

Re: [PATCH 1/2] [btrfs] btrfs_rename: abort transaction in case of error.

2015-06-29 Thread Davide Italiano
On Mon, Jun 29, 2015 at 4:59 AM, Filipe David Manana  wrote:
> On Sun, Jun 28, 2015 at 10:47 PM, Davide C. C. Italiano
>  wrote:
>> From: Davide Italiano 
>>
>> btrfs_insert_inode_ref() may fail and we want to make sure
>> the transaction is aborted before calling btrfs_end_transaction(),
>> as it already happens everywhere else in this function in case
>> of error.
>>
>> Signed-off-by: Davide Italiano 
>> ---
>>  fs/btrfs/inode.c | 5 -
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 8bb0136..59c475c 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -9114,8 +9114,11 @@ static int btrfs_rename(struct inode *old_dir, struct 
>> dentry *old_dentry,
>>  new_dentry->d_name.len,
>>  old_ino,
>>  btrfs_ino(new_dir), index);
>> -   if (ret)
>> +   if (ret) {
>> +   btrfs_abort_transaction(trans, root, ret);
>> goto out_fail;
>> +   }
>> +
>
> Hi,
>
> I don't think we need a transaction abortion here. The reason it's not
> being done is likely because at that point the trees are in a
> consistent state (i.e. we haven't touched any of them yet) and not
> because it was forgotten. So an abortion there is
> unnecessary/excessive.
>
> thanks
>

Thank you for the comment -- I updated the other patch and I have
mixed feeling about this one.
I can either withdrawn the review or provide a new patch where I add a
comment to clarify why this is not needed, for the future.
Which one do you like better?

--
Davide
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/11] test xfs-specific reflink pieces

2015-06-29 Thread Darrick J. Wong
Check that growfs and xfs_fsr still work properly on reflinked fses.

Signed-off-by: Darrick J. Wong 
---
 tests/xfs/800 |   77 
 tests/xfs/800.out |5 ++
 tests/xfs/801 |  114 +
 tests/xfs/801.out |   15 +++
 tests/xfs/group   |2 +
 5 files changed, 213 insertions(+)
 create mode 100755 tests/xfs/800
 create mode 100644 tests/xfs/800.out
 create mode 100755 tests/xfs/801
 create mode 100644 tests/xfs/801.out


diff --git a/tests/xfs/800 b/tests/xfs/800
new file mode 100755
index 000..62b431a
--- /dev/null
+++ b/tests/xfs/800
@@ -0,0 +1,77 @@
+#! /bin/bash
+# FS QA Test No. 800
+#
+# Tests xfs_growfs on a reflinked filesystem
+#
+#---
+# Copyright (c) 2015, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+cd /
+rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. common/rc
+. common/filter
+
+# real QA test starts here
+_supported_fs xfs
+_require_scratch
+_require_scratch_reflink
+_supported_os Linux
+
+_require_xfs_io_command "fiemap"
+_require_cp_reflink
+
+echo "Format and mount"
+_scratch_mkfs -d size=$((2 * 4096 * 4096)) -l size=4194304 > $seqres.full 2>&1
+_scratch_mount >> $seqres.full 2>&1
+
+TESTDIR=$SCRATCH_MNT/test-$seq
+rm -rf $TESTDIR
+mkdir $TESTDIR
+
+echo "Create the original file and reflink to copy1, copy2"
+$XFS_IO_PROG -f -c 'pwrite -S 0x61 0 9000' $TESTDIR/original \
+>> $seqres.full 2>&1
+cp --reflink $TESTDIR/original $TESTDIR/copy1
+cp --reflink $TESTDIR/copy1 $TESTDIR/copy2
+
+echo "Grow fs"
+$XFS_GROWFS_PROG $SCRATCH_MNT 2>&1 |  _filter_growfs >> $seqres.full
+
+xfs_info $SCRATCH_MNT >> $seqres.full
+
+echo "Check scratch fs"
+umount $SCRATCH_MNT
+_check_scratch_fs
+
+# success, all done
+status=0
+exit
diff --git a/tests/xfs/800.out b/tests/xfs/800.out
new file mode 100644
index 000..280daa5
--- /dev/null
+++ b/tests/xfs/800.out
@@ -0,0 +1,5 @@
+QA output created by 800
+Format and mount
+Create the original file and reflink to copy1, copy2
+Grow fs
+Check scratch fs
diff --git a/tests/xfs/801 b/tests/xfs/801
new file mode 100755
index 000..5a75e5f
--- /dev/null
+++ b/tests/xfs/801
@@ -0,0 +1,114 @@
+#! /bin/bash
+# FS QA Test No. 801
+#
+# Ensure that xfs_fsr un-reflinks files while defragmenting
+#
+#---
+# Copyright (c) 2015, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+cd /
+rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. common/rc
+. common/filter
+
+# real QA test starts here
+_supported_fs xfs
+_require_scratch
+_require_scratch_reflink
+_supported_os Linux
+
+_require_xfs_io_command "fiemap"
+_require_cp_reflink
+
+echo "Format and mount"
+_scratch_mkfs > $seqres.full 2>&1
+_scratch_mount >> $seqres.full 2>&1
+
+TESTDIR=$SCRATCH_MNT/test-$seq
+rm -rf $TESTDIR
+mkdir $TESTDIR
+
+echo "Create the original file and reflink to copy1, copy2"
+$XFS_IO_PROG -f -c 'pwrite -S 0x61 0 66000' $TESTDIR/original \
+>> $seqres.full 2>&1
+cp --reflink $

[PATCH 09/11] test error conditions on reflink

2015-06-29 Thread Darrick J. Wong
Check that we can feed bad inputs to reflink and it'll reject them.

Signed-off-by: Darrick J. Wong 
---
 tests/generic/839 |  123 +
 tests/generic/839.out |   30 
 tests/generic/group   |1 
 3 files changed, 154 insertions(+)
 create mode 100755 tests/generic/839
 create mode 100644 tests/generic/839.out


diff --git a/tests/generic/839 b/tests/generic/839
new file mode 100755
index 000..61532df
--- /dev/null
+++ b/tests/generic/839
@@ -0,0 +1,123 @@
+#! /bin/bash
+# FS QA Test No. 839
+#
+# Check that cross-device reflink and dedupe are rejected
+#
+#---
+# Copyright (c) 2015, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+cd /
+rm -rf $tmp.* $TESTDIR1
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/attr
+
+# real QA test starts here
+_require_test
+_require_scratch
+_require_scratch_reflink
+_require_test_reflink
+_supported_os Linux
+
+_require_xfs_io_command "fiemap"
+_require_xfs_io_command "reflink"
+_require_xfs_io_command "dedupe"
+
+rm -f $seqres.full
+
+echo "Format and mount"
+_scratch_mkfs > $seqres.full 2>&1
+_scratch_mount >> $seqres.full 2>&1
+
+TESTDIR1=$TEST_DIR/test-$seq
+rm -rf $TESTDIR1
+mkdir $TESTDIR1
+
+TESTDIR2=$SCRATCH_MNT/test-$seq
+rm -rf $TESTDIR2
+mkdir $TESTDIR2
+
+echo "Create the original files"
+BLKSZ="$(stat -f $TESTDIR1 -c '%S')"
+BLKS=1000
+MARGIN=50
+SZ=$((BLKSZ * BLKS))
+FREE_BLOCKS0=$(stat -f $TESTDIR1 -c '%f')
+NR=4
+$XFS_IO_PROG -f -c "pwrite -S 0x61 0 $SZ" $TESTDIR1/file1 >> $seqres.full
+$XFS_IO_PROG -f -c "pwrite -S 0x61 0 $SZ" $TESTDIR1/file2 >> $seqres.full
+$XFS_IO_PROG -f -c "pwrite -S 0x61 0 $SZ" $TESTDIR2/file1 >> $seqres.full
+sync
+
+echo "Try cross-device reflink"
+$XFS_IO_PROG -f -c "reflink $TESTDIR1/file1 0 0 $BLKSZ" $TESTDIR2/file1
+echo "Try cross-device dedupe"
+$XFS_IO_PROG -f -c "dedupe $TESTDIR1/file1 0 0 $BLKSZ" $TESTDIR2/file1
+
+echo "Try unaligned reflink"
+$XFS_IO_PROG -f -c "reflink $TESTDIR1/file1 37 59 23" $TESTDIR1/file2
+echo "Try unaligned dedupe"
+$XFS_IO_PROG -f -c "dedupe $TESTDIR1/file1 37 59 23" $TESTDIR1/file2
+
+echo "Try overlapping reflink"
+$XFS_IO_PROG -f -c "reflink $TESTDIR1/file1 0 0 $BLKSZ" $TESTDIR1/file1
+echo "Try overlapping dedupe"
+$XFS_IO_PROG -f -c "dedupe $TESTDIR1/file2 0 0 $BLKSZ" $TESTDIR1/file2
+
+echo "Try reflink past EOF"
+$XFS_IO_PROG -f -c "reflink $TESTDIR1/file1 $(( (BLKS + 1000) * BLKSZ)) 0 
$BLKSZ" $TESTDIR1/file1
+echo "Try dedupe past EOF"
+$XFS_IO_PROG -f -c "dedupe $TESTDIR1/file2 $(( (BLKS + 1000) * BLKSZ)) 0 
$BLKSZ" $TESTDIR1/file2
+
+chattr +i $TESTDIR1/file1 $TESTDIR1/file2
+echo "Try reflink on immutable files"
+$XFS_IO_PROG -f -c "reflink $TESTDIR1/file1 0 0 $BLKSZ" $TESTDIR1/file2 > 
$seqres.full
+echo "Try dedupe on immutable files"
+$XFS_IO_PROG -f -c "reflink $TESTDIR1/file1 $BLKSZ $BLKSZ $BLKSZ" 
$TESTDIR1/file2 > $seqres.full
+chattr -i $TESTDIR1/file1 $TESTDIR1/file2
+
+echo "Reflink two files"
+$XFS_IO_PROG -f -c "reflink $TESTDIR1/file1 0 0 $BLKSZ" $TESTDIR1/file2 > 
$seqres.full
+$XFS_IO_PROG -f -c "reflink $TESTDIR2/file1 0 0 $BLKSZ" $TESTDIR2/file2 > 
$seqres.full
+echo "Dedupe two files"
+$XFS_IO_PROG -f -c "dedupe $TESTDIR1/file1 $BLKSZ $BLKSZ $BLKSZ" 
$TESTDIR1/file2 > $seqres.full
+$XFS_IO_PROG -f -c "dedupe $TESTDIR2/file1 $BLKSZ $BLKSZ $BLKSZ" 
$TESTDIR2/file2 > $seqres.full
+
+lsattr -l $TESTDIR1/ | _filter_test_dir
+lsattr -l $TESTDIR2/ | sed -e "s,$SCRATCH_MNT,SCRATCH_MNT,g"
+
+echo "Check scratch fs"
+umount $SCRATCH_MNT
+_check_scratch_fs
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/839.out b/tests/generic/839.out
new file mode 100644
index 000..91a2425
--- /dev/null
+++ b/tests/generic/839.out
@@ -0,0 +1,30 @@
+QA output created by 839
+Format and mount
+Create the original files
+Try cross-device reflink
+reflink: Invalid cross-device link
+Try cross-device dedupe
+dedupe: Invalid cross-device link
+Try unaligned reflink
+reflink: In

[PATCH 08/11] test reflink for accuracy in free block counts

2015-06-29 Thread Darrick J. Wong
Check that the free block counts seem to be handled correctly in
the reflink operation and subsequent attempts to rewrite reflinked
copies.

Signed-off-by: Darrick J. Wong 
---
 tests/generic/830 |   88 +
 tests/generic/830.out |5 +
 tests/generic/831 |  120 ++
 tests/generic/831.out |   11 +++
 tests/generic/832 |  143 +
 tests/generic/832.out |   11 +++
 tests/generic/833 |  142 
 tests/generic/833.out |   11 +++
 tests/generic/834 |  160 ++
 tests/generic/834.out |   16 +
 tests/generic/835 |  164 +++
 tests/generic/835.out |   16 +
 tests/generic/836 |  172 +
 tests/generic/836.out |   36 ++
 tests/generic/group   |7 ++
 15 files changed, 1102 insertions(+)
 create mode 100755 tests/generic/830
 create mode 100644 tests/generic/830.out
 create mode 100755 tests/generic/831
 create mode 100644 tests/generic/831.out
 create mode 100755 tests/generic/832
 create mode 100644 tests/generic/832.out
 create mode 100755 tests/generic/833
 create mode 100644 tests/generic/833.out
 create mode 100755 tests/generic/834
 create mode 100644 tests/generic/834.out
 create mode 100755 tests/generic/835
 create mode 100644 tests/generic/835.out
 create mode 100755 tests/generic/836
 create mode 100644 tests/generic/836.out


diff --git a/tests/generic/830 b/tests/generic/830
new file mode 100755
index 000..4862603
--- /dev/null
+++ b/tests/generic/830
@@ -0,0 +1,88 @@
+#! /bin/bash
+# FS QA Test No. 830
+#
+# Ensure that reflinking a file N times doesn't eat a lot of blocks
+#   - Create a file and record fs block usage
+#   - Create some reflink copies
+#   - Compare fs block usage to before
+#
+#---
+# Copyright (c) 2015, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+cd /
+rm -rf $tmp.* $TESTDIR
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_require_test_reflink
+_supported_os Linux
+
+_require_xfs_io_command "fiemap"
+_require_cp_reflink
+_require_test
+
+rm -f $seqres.full
+
+TESTDIR=$TEST_DIR/test-$seq
+rm -rf $TESTDIR
+mkdir $TESTDIR
+
+echo "Create the original file blocks"
+BLKSZ="$(stat -f $TESTDIR -c '%S')"
+BLKS=1000
+MARGIN=50
+SZ=$((BLKSZ * BLKS))
+NR=7
+$XFS_IO_PROG -f -c "pwrite -S 0x61 0 $SZ" $TESTDIR/file1 >> $seqres.full
+sync
+FREE_BLOCKS0=$(stat -f $TESTDIR -c '%f')
+
+echo "Create the reflink copies"
+for i in `seq 2 $NR`; do
+   cp --reflink=always $TESTDIR/file1 $TESTDIR/file.$i
+done
+sync
+FREE_BLOCKS1=$(stat -f $TESTDIR -c '%f')
+
+echo "Compare free block count"
+DIFF=$((FREE_BLOCKS0 - FREE_BLOCKS1))
+if [ $DIFF -gt $MARGIN ]; then
+   echo "Free blocks decreased by more than $((-DIFF))."
+elif [ $DIFF -lt 0 ]; then
+   echo "Free blocks increased by $((-DIFF))?"
+else
+   echo "Looks ok"
+fi
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/830.out b/tests/generic/830.out
new file mode 100644
index 000..e2802dd
--- /dev/null
+++ b/tests/generic/830.out
@@ -0,0 +1,5 @@
+QA output created by 830
+Create the original file blocks
+Create the reflink copies
+Compare free block count
+Looks ok
diff --git a/tests/generic/831 b/tests/generic/831
new file mode 100755
index 000..4947261
--- /dev/null
+++ b/tests/generic/831
@@ -0,0 +1,120 @@
+#! /bin/bash
+# FS QA Test No. 831
+#
+# Ensure that deleting all copies of a file reflinked N times releases the 
blocks
+#   - Record fs block usage (0)
+#   - Create a file and some reflink copies
+#   - Record fs block usage (1)
+#   - Delete some copies of the file
+#   - Record fs block usage (2)
+#   - Delete all copies of the file
+#   - Compare fs block usage to (2), (1), and (0)
+#
+#--

[PATCH 07/11] reflink concurrent operations tests

2015-06-29 Thread Darrick J. Wong
Make sure that running reflink ops while other IO is ongoing doesn't
break the filesystem.

Signed-off-by: Darrick J. Wong 
---
 tests/generic/821 |   95 
 tests/generic/821.out |6 +++
 tests/generic/822 |   95 
 tests/generic/822.out |6 +++
 tests/generic/823 |   93 +++
 tests/generic/823.out |6 +++
 tests/generic/824 |   94 +++
 tests/generic/824.out |6 +++
 tests/generic/825 |  107 +
 tests/generic/825.out |7 +++
 tests/generic/826 |  107 +
 tests/generic/826.out |7 +++
 tests/generic/827 |   97 
 tests/generic/827.out |7 +++
 tests/generic/828 |   97 
 tests/generic/828.out |7 +++
 tests/generic/829 |   83 ++
 tests/generic/829.out |6 +++
 tests/generic/group   |9 
 19 files changed, 935 insertions(+)
 create mode 100755 tests/generic/821
 create mode 100644 tests/generic/821.out
 create mode 100755 tests/generic/822
 create mode 100644 tests/generic/822.out
 create mode 100755 tests/generic/823
 create mode 100644 tests/generic/823.out
 create mode 100755 tests/generic/824
 create mode 100644 tests/generic/824.out
 create mode 100755 tests/generic/825
 create mode 100644 tests/generic/825.out
 create mode 100755 tests/generic/826
 create mode 100644 tests/generic/826.out
 create mode 100755 tests/generic/827
 create mode 100644 tests/generic/827.out
 create mode 100755 tests/generic/828
 create mode 100644 tests/generic/828.out
 create mode 100755 tests/generic/829
 create mode 100644 tests/generic/829.out


diff --git a/tests/generic/821 b/tests/generic/821
new file mode 100755
index 000..b2d4bd9
--- /dev/null
+++ b/tests/generic/821
@@ -0,0 +1,95 @@
+#! /bin/bash
+# FS QA Test No. 821
+#
+# Test for race between direct I/O and reflink
+#
+#---
+# Copyright (c) 2015 Oracle, Inc.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 7 15
+
+_cleanup()
+{
+cd /
+rm -rf $tmp.*
+wait
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_fs generic
+_supported_os Linux
+
+_require_scratch
+_require_scratch_reflink
+_require_cp_reflink
+_require_xfs_io_command "reflink"
+
+echo "Format and mount"
+_scratch_mkfs > $seqres.full 2>&1
+_scratch_mount >> $seqres.full 2>&1
+
+TESTDIR=$SCRATCH_MNT/test-$seq
+rm -rf $TESTDIR
+mkdir $TESTDIR
+
+loops=512
+iosize=65536
+
+echo "Initialize files"
+echo > $seqres.full
+$XFS_IO_PROG -f -c "pwrite -S 0x61 0 $((loops * iosize))" $TESTDIR/file1 >> 
$seqres.full
+$XFS_IO_PROG -f -c "pwrite -S 0x62 0 $((loops * iosize))" $TESTDIR/file2 >> 
$seqres.full
+
+# Direct I/O overwriter...
+overwrite() {
+   while [ ! -e $TESTDIR/finished ]; do
+   dd if=/dev/zero of=$TESTDIR/file2 oflag=direct bs=$iosize 
count=$loops conv=notrunc 2> /dev/null
+   done
+}
+
+echo "Reflink and dio write the target"
+overwrite &
+start=`expr $loops - 1`
+for i in `seq $start -1 0`
+do
+   $XFS_IO_PROG -f -c "reflink $TESTDIR/file1 $((i * iosize)) $((i * 
iosize)) $iosize" $TESTDIR/file2 >> $seqres.full
+   [ $? -ne 0 ] && exit
+done
+touch $TESTDIR/finished
+wait
+
+echo "Check for damage"
+umount $SCRATCH_MNT
+_check_scratch_fs
+
+echo "Done"
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/821.out b/tests/generic/821.out
new file mode 100644
index 000..ca6bc53
--- /dev/null
+++ b/tests/generic/821.out
@@ -0,0 +1,6 @@
+QA output created by 821
+Format and mount
+Initialize files
+Reflink and dio write the target
+Check for damage
+Done
diff --git a/tests/generic/822 b/tests/generic/822
new file mode 100755
index 000..cf2f914
--- /dev/null
+++ b/tests/generic/822
@@ -0,0 +1,

[PATCH 06/11] reflink fallocate tests

2015-06-29 Thread Darrick J. Wong
Check that the variants of fallocate (allocate, punch, zero range,
collapse range, insert range) do the right thing when they're run
against a range of reflinked blocks.

Signed-off-by: Darrick J. Wong 
---
 tests/generic/811 |  104 +++
 tests/generic/811.out |   39 ++
 tests/generic/812 |  106 
 tests/generic/812.out |   21 +
 tests/generic/813 |  100 +
 tests/generic/813.out |   24 +++
 tests/generic/814 |  109 +
 tests/generic/814.out |   24 +++
 tests/generic/815 |   95 +++
 tests/generic/815.out |   16 +++
 tests/generic/816 |  100 +
 tests/generic/816.out |   24 +++
 tests/generic/group   |6 +++
 13 files changed, 768 insertions(+)
 create mode 100755 tests/generic/811
 create mode 100644 tests/generic/811.out
 create mode 100755 tests/generic/812
 create mode 100644 tests/generic/812.out
 create mode 100755 tests/generic/813
 create mode 100644 tests/generic/813.out
 create mode 100755 tests/generic/814
 create mode 100644 tests/generic/814.out
 create mode 100755 tests/generic/815
 create mode 100644 tests/generic/815.out
 create mode 100755 tests/generic/816
 create mode 100644 tests/generic/816.out


diff --git a/tests/generic/811 b/tests/generic/811
new file mode 100755
index 000..2eeb1d5
--- /dev/null
+++ b/tests/generic/811
@@ -0,0 +1,104 @@
+#! /bin/bash
+# FS QA Test No. 811
+#
+# Ensure that fallocate steps around reflinked ranges:
+#   - Reflink parts of two files together
+#   - Fallocate all the other sparse space.
+#   - Check that the reflinked areas are still there.
+#
+#---
+# Copyright (c) 2015, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+cd /
+rm -rf $tmp.* $TESTDIR
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_require_test_reflink
+_supported_os Linux
+
+_require_xfs_io_command "fiemap"
+_require_xfs_io_command "reflink"
+_require_xfs_io_command "falloc"
+_require_cp_reflink
+_require_test
+
+rm -f $seqres.full
+
+TESTDIR=$TEST_DIR/test-$seq
+rm -rf $TESTDIR
+mkdir $TESTDIR
+
+echo "Create the original files"
+BLKSZ="$(stat -f $TESTDIR -c '%S')"
+$XFS_IO_PROG -f -c "truncate $((BLKSZ * 5))" -c "pwrite -S 0x61 0 $(( (BLKSZ * 
5) + 37))" $TESTDIR/file1 >> $seqres.full
+$XFS_IO_PROG -f -c "truncate $((BLKSZ * 5))" -c "reflink $TESTDIR/file1 $BLKSZ 
$BLKSZ $(( (BLKSZ * 4) + 37))" $TESTDIR/file2 >> $seqres.full
+$XFS_IO_PROG -f -c "truncate $((BLKSZ * 5))" -c "reflink $TESTDIR/file1 0 0 
$BLKSZ" $TESTDIR/file3 >> $seqres.full
+$XFS_IO_PROG -f -c "truncate $((BLKSZ * 5))" -c "reflink $TESTDIR/file1 $BLKSZ 
$BLKSZ $BLKSZ" $TESTDIR/file4 >> $seqres.full
+$XFS_IO_PROG -f -c "truncate $((BLKSZ * 5))" -c "reflink $TESTDIR/file1 
$((BLKSZ * 3)) $((BLKSZ * 3)) $BLKSZ" $TESTDIR/file4 >> $seqres.full
+cp --reflink=always $TESTDIR/file1 $TESTDIR/file5
+
+md5sum $TESTDIR/file1 | _filter_test_dir
+md5sum $TESTDIR/file2 | _filter_test_dir
+md5sum $TESTDIR/file3 | _filter_test_dir
+md5sum $TESTDIR/file4 | _filter_test_dir
+md5sum $TESTDIR/file5 | _filter_test_dir
+
+echo "falloc everything"
+$XFS_IO_PROG -f -c "falloc 0 $((BLKSZ * 5))" $TESTDIR/file2
+$XFS_IO_PROG -f -c "falloc 0 $((BLKSZ * 5))" $TESTDIR/file3
+$XFS_IO_PROG -f -c "falloc 0 $((BLKSZ * 5))" $TESTDIR/file4
+$XFS_IO_PROG -f -c "falloc 0 $((BLKSZ * 6))" $TESTDIR/file5
+sync
+echo 3 > /proc/sys/vm/drop_caches
+
+echo "Checksum all files"
+md5sum $TESTDIR/file1 | _filter_test_dir
+md5sum $TESTDIR/file2 | _filter_test_dir
+md5sum $TESTDIR/file3 | _filter_test_dir
+md5sum $TESTDIR/file4 | _filter_test_dir
+md5sum $TESTDIR/file5 | _filter_test_dir
+
+checker() {
+   echo '---'
+   for nr in `seq 0 $2`; do
+   echo "Check

[PATCH 05/11] test CoW behaviors of reflinked files

2015-06-29 Thread Darrick J. Wong
Ensure that CoW happens correctly with buffered, directio, and mmap writes.

Signed-off-by: Darrick J. Wong 
---
 tests/generic/808 |  138 +
 tests/generic/808.out |   16 ++
 tests/generic/809 |  138 +
 tests/generic/809.out |   16 ++
 tests/generic/810 |  138 +
 tests/generic/810.out |   16 ++
 tests/generic/837 |   88 +++
 tests/generic/837.out |7 ++
 tests/generic/838 |   88 +++
 tests/generic/838.out |7 ++
 tests/generic/group   |5 ++
 11 files changed, 657 insertions(+)
 create mode 100755 tests/generic/808
 create mode 100644 tests/generic/808.out
 create mode 100755 tests/generic/809
 create mode 100644 tests/generic/809.out
 create mode 100755 tests/generic/810
 create mode 100644 tests/generic/810.out
 create mode 100755 tests/generic/837
 create mode 100644 tests/generic/837.out
 create mode 100755 tests/generic/838
 create mode 100644 tests/generic/838.out


diff --git a/tests/generic/808 b/tests/generic/808
new file mode 100755
index 000..10a79b5
--- /dev/null
+++ b/tests/generic/808
@@ -0,0 +1,138 @@
+#! /bin/bash
+# FS QA Test No. 808
+#
+# Ensuring that copy on write through the page cache works:
+#   - Reflink two files together
+#   - Write to the beginning, middle, and end
+#   - Check that the files are now different where we say they're different.
+#
+#---
+# Copyright (c) 2015, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+cd /
+rm -rf $tmp.* $TESTDIR
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_require_test_reflink
+_supported_os Linux
+
+_require_xfs_io_command "fiemap"
+_require_cp_reflink
+_require_test
+
+rm -f $seqres.full
+
+TESTDIR=$TEST_DIR/test-$seq
+rm -rf $TESTDIR
+mkdir $TESTDIR
+
+echo "Create the original files"
+$XFS_IO_PROG -f -c 'pwrite -S 0x61 0 196605' $TESTDIR/file1 >> $seqres.full
+cp --reflink $TESTDIR/file1 $TESTDIR/file2 >> $seqres.full
+$XFS_IO_PROG -f -c 'pwrite -S 0x61 0 196605' $TESTDIR/file3 >> $seqres.full
+
+md5sum $TESTDIR/file1 | _filter_test_dir
+md5sum $TESTDIR/file2 | _filter_test_dir
+md5sum $TESTDIR/file3 | _filter_test_dir
+
+echo "CoW the second file"
+$XFS_IO_PROG -f -c "pwrite 0 17" $TESTDIR/file2 >> $seqres.full
+$XFS_IO_PROG -f -c "pwrite 0 17" $TESTDIR/file3 >> $seqres.full
+
+$XFS_IO_PROG -f -c "pwrite 6 17" $TESTDIR/file2 >> $seqres.full
+$XFS_IO_PROG -f -c "pwrite 6 17" $TESTDIR/file3 >> $seqres.full
+
+$XFS_IO_PROG -f -c "pwrite 196600 17" $TESTDIR/file2 >> $seqres.full
+$XFS_IO_PROG -f -c "pwrite 196600 17" $TESTDIR/file3 >> $seqres.full
+sync
+echo 3 > /proc/sys/vm/drop_caches
+
+echo "Checksum both files"
+md5sum $TESTDIR/file1 | _filter_test_dir
+md5sum $TESTDIR/file2 | _filter_test_dir
+md5sum $TESTDIR/file3 | _filter_test_dir
+
+echo "Compare the CoW'd section to the before file"
+cmp -s <($XFS_IO_PROG -f -c 'pread -q -v 0 17' $TESTDIR/file1) \
+   <($XFS_IO_PROG -f -c 'pread -q -v 0 17' $TESTDIR/file2) \
+   || echo "Sections do not match (intentional)"
+cmp -s <($XFS_IO_PROG -f -c 'pread -q -v 6 17' $TESTDIR/file1) \
+   <($XFS_IO_PROG -f -c 'pread -q -v 6 17' $TESTDIR/file2) \
+   || echo "Sections do not match (intentional)"
+cmp -s <($XFS_IO_PROG -f -c 'pread -q -v 196600 17' $TESTDIR/file1) \
+   <($XFS_IO_PROG -f -c 'pread -q -v 196600 17' $TESTDIR/file2) \
+   || echo "Sections do not match (intentional)"
+
+echo "Compare the CoW'd section to the after file"
+cmp -s <($XFS_IO_PROG -f -c 'pread -q -v 0 17' $TESTDIR/file2) \
+   <($XFS_IO_PROG -f -c 'pread -q -v 0 17' $TESTDIR/file3) \
+   || echo "Sections do not match"
+cmp -s <($XFS_IO_PROG -f -c 'pread -q -v 6 17' $TESTDIR/file2) \
+   <($XFS_IO_PROG -f -c 'pread -q -v 6 17' $TESTDIR/file3) \
+   || echo "Se

[PATCH 04/11] basic tests of the reflink and dedupe ioctls

2015-06-29 Thread Darrick J. Wong
Test the operation of the btrfs (and now xfs) reflink and dedupe
ioctls at various file offsets and with matching and nonmatching
files.

Signed-off-by: Darrick J. Wong 
---
 common/rc |   44 +
 tests/generic/803 |   81 +++
 tests/generic/803.out |   11 
 tests/generic/804 |   86 
 tests/generic/804.out |   13 +
 tests/generic/805 |  113 +++
 tests/generic/805.out |   26 ++
 tests/generic/806 |   81 +++
 tests/generic/806.out |   11 
 tests/generic/807 |   87 +
 tests/generic/807.out |   15 ++
 tests/generic/817 |   89 ++
 tests/generic/817.out |   10 
 tests/generic/818 |   94 +++
 tests/generic/818.out |   12 +
 tests/generic/819 |  130 +
 tests/generic/819.out |7 +++
 tests/generic/820 |  118 
 tests/generic/820.out |7 +++
 tests/generic/group   |9 +++
 20 files changed, 1044 insertions(+)
 create mode 100755 tests/generic/803
 create mode 100644 tests/generic/803.out
 create mode 100755 tests/generic/804
 create mode 100644 tests/generic/804.out
 create mode 100755 tests/generic/805
 create mode 100644 tests/generic/805.out
 create mode 100755 tests/generic/806
 create mode 100644 tests/generic/806.out
 create mode 100755 tests/generic/807
 create mode 100644 tests/generic/807.out
 create mode 100755 tests/generic/817
 create mode 100644 tests/generic/817.out
 create mode 100755 tests/generic/818
 create mode 100644 tests/generic/818.out
 create mode 100755 tests/generic/819
 create mode 100644 tests/generic/819.out
 create mode 100755 tests/generic/820
 create mode 100644 tests/generic/820.out


diff --git a/common/rc b/common/rc
index 8f20dc8..7521033 100644
--- a/common/rc
+++ b/common/rc
@@ -2604,6 +2604,50 @@ _verify_reflink()
|| echo "$1 and $2 are not reflinks: different extents"
 }
 
+# Check that a particular range is shared
+# args: filename, start, end
+_check_shared_extent()
+{
+   $XFS_IO_PROG -f -c 'fiemap -v' $1 | tr '[].:' '' | \
+   awk "BEGIN {x = 0;} {start = \$2 * 512; end = (\$3 + 1) * 512; if 
(and(strtonum(\$7), 0x2000) && start < ($3 + $2) && end > $2) {x++;}} END 
{print x;}"
+}
+
+# Retrieve the pblk(s) associated with a file's lblk range
+# args: filename, start, len, cache_file
+_extent_physical()
+{
+   if [ -z "$3" ]; then
+   len="512"
+   else
+   len="$3"
+   fi
+   fiemap_cache_file="$4"
+   rm_cache_file=0
+   if [ -z "$fiemap_cache_file" ]; then
+   fiemap_cache_file="/tmp/fiemap.$$"
+   rm_cache_file=1
+   fi
+   if [ ! -e "$fiemap_cache_file" ]; then
+   $XFS_IO_PROG -f -c 'fiemap -v' "$1" | tr '[].:' '' | tail 
-n +3 > "$fiemap_cache_file"
+   fi
+   cat "$fiemap_cache_file" | \
+   awk "BEGIN {x = 0;} {start = \$2 * 512; end = (\$3 + 1) * 512; phys = 
\$4 * 512; if (start < ($len + $2) && end > $2) {if (\$4 == \"hole\") 
{printf(\"-1\n\");} else {printf(\"%d\n\", phys + $2 - start);}}}"
+   if [ "$rm_cache_file" -eq 1 ]; then
+   rm "$fiemap_cache_file"
+   fi
+}
+
+_numbers_equal()
+{
+   n="$1"
+   shift
+   i=1
+   for num in "$@"; do
+   test "$num" -eq "$n" || echo "num #$i does not match #0"
+   i=$((i + 1))
+   done
+}
+
 _require_atime()
 {
if [ "$FSTYP" == "nfs" ]; then
diff --git a/tests/generic/803 b/tests/generic/803
new file mode 100755
index 000..47d89a7
--- /dev/null
+++ b/tests/generic/803
@@ -0,0 +1,81 @@
+#! /bin/bash
+# FS QA Test No. 803
+#
+# Ensure that we can reflink parts of two files:
+#   - Reflink identical parts of two identical files
+#   - Check that we end up with identical contents
+#
+#---
+# Copyright (c) 2015, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $se

[PATCH 03/11] generic/32[6-8]: support xfs in addition to btrfs

2015-06-29 Thread Darrick J. Wong
Modify the reflink tests to support xfs.

Signed-off-by: Darrick J. Wong 
---
 common/rc |   37 +
 tests/generic/800 |2 +-
 tests/generic/801 |2 +-
 tests/generic/802 |2 +-
 4 files changed, 40 insertions(+), 3 deletions(-)


diff --git a/common/rc b/common/rc
index 51d2fcb..8f20dc8 100644
--- a/common/rc
+++ b/common/rc
@@ -1397,6 +1397,43 @@ _require_xfs_crc()
umount $SCRATCH_MNT
 }
 
+# this test requires the test fs support reflink...
+#
+_require_test_reflink()
+{
+case $FSTYP in
+xfs)
+   xfs_info "${TEST_DIR}" | grep reflink=1 -c -q || _notrun "Reflink not 
supported by this filesystem type: $FSTYP"
+   ;;
+btrfs)
+true
+;;
+*)
+_notrun "Reflink not supported by this filesystem type: $FSTYP"
+;;
+esac
+}
+
+# this test requires the scratch fs support reflink...
+#
+_require_scratch_reflink()
+{
+case $FSTYP in
+xfs)
+   _scratch_mkfs > /dev/null 2>&1
+   _scratch_mount
+   xfs_info "${TEST_DIR}" | grep reflink=1 -c -q || _notrun "$FSTYP does 
not support reflink"
+   _scratch_unmount
+   ;;
+btrfs)
+true
+;;
+*)
+_notrun "Reflink not supported by this filesystem type: $FSTYP"
+;;
+esac
+}
+
 # this test requires the bigalloc feature to be available in mkfs.ext4
 #
 _require_ext4_mkfs_bigalloc()
diff --git a/tests/generic/800 b/tests/generic/800
index a71f11a..954f39d 100755
--- a/tests/generic/800
+++ b/tests/generic/800
@@ -45,7 +45,7 @@ _cleanup()
 . common/filter
 
 # real QA test starts here
-_supported_fs btrfs
+_require_test_reflink
 _supported_os Linux
 
 _require_xfs_io_command "fiemap"
diff --git a/tests/generic/801 b/tests/generic/801
index b21c44b..aedb6e9 100755
--- a/tests/generic/801
+++ b/tests/generic/801
@@ -45,7 +45,7 @@ _cleanup()
 . common/filter
 
 # real QA test starts here
-_supported_fs btrfs
+_require_test_reflink
 _supported_os Linux
 
 _require_xfs_io_command "fiemap"
diff --git a/tests/generic/802 b/tests/generic/802
index afd8513..51d3414 100755
--- a/tests/generic/802
+++ b/tests/generic/802
@@ -43,7 +43,7 @@ _cleanup()
 . ./common/filter
 
 # real QA test starts here
-_supported_fs btrfs
+_require_test_reflink
 _supported_os Linux
 
 _require_xfs_io_command "fiemap"

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/11] move btrfs reflink tests to generic

2015-06-29 Thread Darrick J. Wong
Move the cp --reflink tests from btrfs/ to generic/ since xfs now
supports that ioctl.

Signed-off-by: Darrick J. Wong 
---
 tests/btrfs/026   |   92 -
 tests/btrfs/026.out   |   16 ---
 tests/btrfs/027   |  109 -
 tests/btrfs/027.out   |   25 ---
 tests/btrfs/028   |   83 -
 tests/btrfs/028.out   |7 ---
 tests/btrfs/group |3 -
 tests/generic/800 |   92 +
 tests/generic/800.out |   16 +++
 tests/generic/801 |  109 +
 tests/generic/801.out |   25 +++
 tests/generic/802 |   83 +
 tests/generic/802.out |7 +++
 tests/generic/group   |3 +
 14 files changed, 335 insertions(+), 335 deletions(-)
 delete mode 100755 tests/btrfs/026
 delete mode 100644 tests/btrfs/026.out
 delete mode 100755 tests/btrfs/027
 delete mode 100644 tests/btrfs/027.out
 delete mode 100755 tests/btrfs/028
 delete mode 100644 tests/btrfs/028.out
 create mode 100755 tests/generic/800
 create mode 100644 tests/generic/800.out
 create mode 100755 tests/generic/801
 create mode 100644 tests/generic/801.out
 create mode 100755 tests/generic/802
 create mode 100644 tests/generic/802.out


diff --git a/tests/btrfs/026 b/tests/btrfs/026
deleted file mode 100755
index 7559ca2..000
--- a/tests/btrfs/026
+++ /dev/null
@@ -1,92 +0,0 @@
-#! /bin/bash
-# FS QA Test No. 026
-#
-# Tests file clone functionality of btrfs ("reflinks"):
-#   - Reflink a file
-#   - Reflink the reflinked file
-#   - Modify the original file
-#   - Modify the reflinked file
-#
-#---
-# Copyright (c) 2014, Oracle and/or its affiliates.  All Rights Reserved.
-#
-# This program is free software; you can redistribute it and/or
-# modify it under the terms of the GNU General Public License as
-# published by the Free Software Foundation.
-#
-# This program is distributed in the hope that it would be useful,
-# but WITHOUT ANY WARRANTY; without even the implied warranty of
-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-# GNU General Public License for more details.
-#
-# You should have received a copy of the GNU General Public License
-# along with this program; if not, write the Free Software Foundation,
-# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
-#---
-#
-
-seq=`basename $0`
-seqres=$RESULT_DIR/$seq
-echo "QA output created by $seq"
-
-here=`pwd`
-tmp=/tmp/$$
-status=1# failure is the default!
-trap "_cleanup; exit \$status" 0 1 2 3 15
-
-_cleanup()
-{
-cd /
-rm -f $tmp.*
-}
-
-# get standard environment, filters and checks
-. common/rc
-. common/filter
-
-# real QA test starts here
-_supported_fs btrfs
-_supported_os Linux
-
-_require_xfs_io_command "fiemap"
-_require_cp_reflink
-_require_test
-
-TESTDIR1=$TEST_DIR/test-$seq
-rm -rf $TESTDIR1
-mkdir $TESTDIR1
-
-_checksum_files() {
-for F in original copy1 copy2
-do
-md5sum $TESTDIR1/$F | _filter_test_dir
-done
-}
-
-rm -f $seqres.full
-
-echo "Create the original file and reflink to copy1, copy2"
-$XFS_IO_PROG -f -c 'pwrite -S 0x61 0 9000' $TESTDIR1/original \
->> $seqres.full 2>&1
-cp --reflink $TESTDIR1/original $TESTDIR1/copy1
-cp --reflink $TESTDIR1/copy1 $TESTDIR1/copy2
-_verify_reflink $TESTDIR1/original $TESTDIR1/copy1
-_verify_reflink $TESTDIR1/original $TESTDIR1/copy2
-echo "Original md5sums:"
-_checksum_files
-
-echo "Overwrite original file with new data"
-$XFS_IO_PROG -c 'pwrite -S 0x62 0 9000' $TESTDIR1/original \
->> $seqres.full 2>&1
-echo "md5sums after overwriting original:"
-_checksum_files
-
-echo "Overwrite copy1 with different new data"
-$XFS_IO_PROG -c 'pwrite -S 0x63 0 9000' $TESTDIR1/copy1 \
->> $seqres.full 2>&1
-echo "md5sums after overwriting copy1:"
-_checksum_files
-
-# success, all done
-status=0
-exit
diff --git a/tests/btrfs/026.out b/tests/btrfs/026.out
deleted file mode 100644
index 3b90ff0..000
--- a/tests/btrfs/026.out
+++ /dev/null
@@ -1,16 +0,0 @@
-QA output created by 026
-Create the original file and reflink to copy1, copy2
-Original md5sums:
-42d69d1a6d333a7ebdf64792a555e392  TEST_DIR/test-026/original
-42d69d1a6d333a7ebdf64792a555e392  TEST_DIR/test-026/copy1
-42d69d1a6d333a7ebdf64792a555e392  TEST_DIR/test-026/copy2
-Overwrite original file with new data
-md5sums after overwriting original:
-4a847a25439532bf48b68c9e9536ed5b  TEST_DIR/test-026/original
-42d69d1a6d333a7ebdf64792a555e392  TEST_DIR/test-026/copy1
-42d69d1a6d333a7ebdf64792a555e392  TEST_DIR/test-026/copy2
-Overwrite copy1 with different new data
-md5sums after overwriting copy1:
-4a847a25439532bf48b68c9e9536ed5b  TEST_DIR/test-026/original
-e271cd47d9f62ebc96cb4e67ae4d16db  TEST_DIR/test-

[RFC 00/11] xfstests: test the btrfs/xfs reflink/dedupe ioctls

2015-06-29 Thread Darrick J. Wong
Hi all,

This is a RFC-quality pass at making xfstests perform more rigorous
testing of the btrfs/xfs file clone, reflink, and dedupe ioctls.
There are now tests of the basic functionality of the three ioctls;
tests to ensure that the filesystem exhibits the expected copy on
write semantics; tests to try to suss out race conditions in the new
write paths; tests to ensure that the ioctls peform basic disk
accounting correctly; tests of the interaction between reflink and the
various fallocate verbs (allocate, punch, collapse, insert zeroes);
and some attempts to test the upper limits of reflinking.  The first
patch in the series adds fuzz testing to ext4 and XFS; aside from
being first in line, it isn't tied to any of the reflink
functionality.

To run these tests, you'll have to patch xfsprogs to have reflink and
dedupe support[1]; the patch ought to apply fairly cleanly against the
upstream git.  They should more or less work with the btrfs that
appears in 4.1, though if you want to test the XFS implementation,
you're going to have to apply a lot of patches to the kernel and
xfsprogs.  See the cover letters[2][3] for those patchsets for more
information.

Known issues: 
 * I think the race checks for dedupe could be a little sharper at
   finding mistakes.
 * I started the numbering really high to prevent the tests from
   colliding with whatever new tests might arrive; this will require
   some intervention to fix.
 * When ext4 gains reflink support, it shouldn't be difficult to make
   these tests run on it.  The patch set is based on the current
   xfstest master on kernel.org.
 * If the copy_file_range syscall ever comes around, we'll have to
   adapt xfs_io to use that in addition to the btrfs ioctls.

Comments and questions are, as always, welcome.

--D

[1] http://djwong.org/docs/03-xfs_io-reflink-and-dedupe.patch
[2] See thread "[RFC 00/15] xfsprogs: support the reflink btree" dated today.
[3] http://oss.sgi.com/archives/xfs/2015-06/msg00407.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/11] fuzz XFS and ext4 filesystems

2015-06-29 Thread Darrick J. Wong
Introduce tests for XFS and ext4 which format a filesystem, populate
it, then uses blocktrash and e2fuzz to corrupt the metadata.  The FS
is remounted, modified, and unmounted.  Following that, xfs_repair or
e2fsck are run until it no longer finds errors to correct, after which
the FS is mounted yet again and exercised to see if there are any
errors remaining.

Signed-off-by: Darrick J. Wong 
---
 tests/ext4/700 |  208 +
 tests/ext4/700.out |3 +
 tests/ext4/group   |1 
 tests/xfs/700  |  219 
 tests/xfs/700.out  |3 +
 tests/xfs/group|1 
 6 files changed, 435 insertions(+)
 create mode 100755 tests/ext4/700
 create mode 100644 tests/ext4/700.out
 create mode 100755 tests/xfs/700
 create mode 100644 tests/xfs/700.out


diff --git a/tests/ext4/700 b/tests/ext4/700
new file mode 100755
index 000..21de274
--- /dev/null
+++ b/tests/ext4/700
@@ -0,0 +1,208 @@
+#! /bin/bash
+# FS QA Test No. 700
+#
+# Create and populate an ext4 filesystem, fuzz the metadata, then see how
+# the kernel reacts, how e2fsck fares in fixing the mess, and then
+# try more kernel accesses to see if it really fixed things.
+#
+#---
+# Copyright (c) 2015 Oracle, Inc.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+cd /
+#rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/attr
+
+if [ ! -x "$(which e2fuzz)" ]; then
+   _notrun "Couldn't find e2fuzz"
+fi
+
+# real QA test starts here
+_supported_fs ext4
+_supported_os Linux
+
+_require_scratch
+_require_attrs
+
+rm -f $seqres.full
+echo "See interesting results in $seqres.full" | sed -e 
"s,$RESULT_DIR,RESULT_DIR,g"
+SRCDIR=`pwd`
+test -z "${FUZZ_ARGS}" && FUZZ_ARGS="-b 32 -v"
+test -z "${FSCK_PASSES}" && FSCK_PASSES=10
+BLK_SZ=4096
+
+echo "fuzzing ext4 with FUZZ_ARGS=$FUZZ_ARGS and FSCK_PASSES=$FSCK_PASSES" > 
$seqres.full
+
+echo "+ create scratch fs" >> $seqres.full
+_scratch_mkfs_ext4 >> $seqres.full 2>&1
+
+echo "+ populate fs image" >> $seqres.full
+_scratch_mount >> $seqres.full 2>&1
+
+echo "+ check fs creation" >> $seqres.full
+dumpe2fs -h "${SCRATCH_DEV}" >> $seqres.full 2>&1
+
+SRC_SZ="$(du -ks "${SRCDIR}" | cut -f 1)"
+FS_SZ="$(( $(stat -f "${SCRATCH_MNT}" -c '%a * %S') / 1024 ))"
+NR="$(( (FS_SZ * 6 / 10) / SRC_SZ ))"
+if [ "${NR}" -lt 1 ]; then
+   NR=1
+fi
+
+echo "+ make some copies" >> $seqres.full
+seq 1 "${NR}" | while read nr; do
+   cp -pRdu "${SRCDIR}" "${SCRATCH_MNT}/test.${nr}" >> $seqres.full 2>&1
+done
+umount "${SCRATCH_MNT}" >> $seqres.full 2>&1
+
+echo "+ check fs" >> $seqres.full
+_check_scratch_fs >> $seqres.full 2>&1
+
+echo "++ corrupt image" >> $seqres.full
+e2fuzz ${FUZZ_ARGS} ${SCRATCH_DEV} >> $seqres.full 2>&1
+
+echo "++ mount image" >> $seqres.full
+_scratch_mount >> $seqres.full 2>&1
+
+echo "+++ ls -laR" >> $seqres.full
+ls -laR "${SCRATCH_MNT}/test.1/" >/dev/null 2>&1
+
+echo "+++ cat files" >> $seqres.full
+(find "${SCRATCH_MNT}/test.1/" -type f -size -1048576k -print0 | xargs -0 cat) 
>/dev/null 2>&1
+
+echo "+++ expand" >> $seqres.full
+find "${SCRATCH_MNT}/" -type f 2> /dev/null | head -n 5 | while read f; do
+   attr -l "$f" > /dev/null 2>&1
+   if [ -f "$f" -a -w "$f" ]; then
+   dd if=/dev/zero bs="${BLK_SZ}" count=1 >> "$f" 2>/dev/null
+   fi
+   mv "$f" "$f.longer" > /dev/null 2>/dev/null
+done
+sync
+
+echo "+++ create files" >> $seqres.full
+cp -pRdu "${SRCDIR}" "${SCRATCH_MNT}/test.moo" > /dev/null 2>&1
+sync
+
+echo "+++ remove files" >> $seqres.full
+rm -rf "${SCRATCH_MNT}/test.moo" > /dev/null 2>&1
+rm -rf "${SCRATCH_MNT}/test.1" > /dev/null 2>&1
+umount "${SCRATCH_MNT}"
+
+_fsck_pass() {
+   fsck_pass="$1"
+
+   FSCK_LOG="${tmp}-fuzz-${fsck_pass}.log"
+   echo "++ fsck pass ${fsck_pass}" > "${FSCK_LOG}"
+   e2fsck -f -y "${SCRATCH_DEV}"
+   res=$?
+   if [ "${res}" -eq 0 ]; then
+   echo "++ allegedly fixe