Re: suspected BTRFS errors resulting in file system becoming unrecovable

2016-02-08 Thread Austin S. Hemmelgarn

On 2016-02-08 11:23, WillIam Thorne wrote:

Thanks all for the help. Here’s a bit more info below. Seeing as its
possibly related to the USB implementation on the pi, I have cc’d their
mailing list.

Glad we could be of assistance.



On 25 Jan 2016, at 16:43, Austin S. Hemmelgarn > wrote:

On 2016-01-25 09:58, WillIam Thorne wrote:

Hi

I have a WD 3TB external HD attached over USB to an arm based micro
PC (rasp pi). I was experimenting with btrfs for storing email
archives but recently encountered some problems which resulted in the
filesystem becoming apparently unrecoverable. I’m not an expert and
it was quicker to switch back to ext4 and restored from backup so no
support needed. Here what appears to be the relevant part of the
syslog including the stack trace in case it is useful:

Best
W

pi@mail /var/log $ btrfs --version
Btrfs Btrfs v0.19

In general, if you plan to use BTRFS on Debian (or Raspbian), you
should be building the tools yourself locally, Debian is almost as bad
about staying up to date as most enterprise distros.


pi@mail /var/log $ uname -a
Linux mail 4.1.7-v7+ #817 SMP PREEMPT Sat Sep 19 15:32:00 BST 2015
armv7l GNU/Linux

Jan 20 09:42:08 mail kernel: [2762753.507576] usb 1-1.5: reset
high-speed USB device number 4 using dwc_otg

The device reset always seemed to happen directly after my tarsnap
 backup ran, although this had been running
fine for a month or so before hand. I noticed the problems when I came
back from holiday over christmas. Maybe it’s load related, the usb
driver / controller on the pi used to be a little buggy, maybe they
didn’t catch everything.
If it was working correctly that long before this happened, that says 
one of two things to me:
1. It's a non-periodic intermittent error due to a design flaw or 
manufacturing defect in part of the hardware.

2. Some part of the hardware is failing.

Based on what you say below, I think the first one is the case.  Either 
way though, I would suggest you make sure you have working backups of 
any data you care about on this device, as either case is likely to 
cause data loss.



Jan 20 09:43:18 mail kernel: [2762823.972777] sd 0:0:0:0: [sda]
UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Jan 20 09:43:18 mail kernel: [2762823.972806] sd 0:0:0:0: [sda] Sense
Key : 0x2 [current]
Jan 20 09:43:18 mail kernel: [2762823.972819] sd 0:0:0:0: [sda]
ASC=0x3a ASCQ=0x0
Jan 20 09:43:18 mail kernel: [2762823.972837] sd 0:0:0:0: [sda] CDB:
opcode=0x2a 2a 00 00 f7 2c 20 00 00 f0 00
Jan 20 09:43:18 mail kernel: [2762823.972851] blk_update_request: I/O
error, dev sda, sector 16198688

This line right here ^^^ indicates that it was triggered by an issue
with the USB device.  I don't personally know enough about USB-MSC and
SCSI to know for certain what is happening, but you should probably
scan your logs and make sure you're not still getting stuff like this,
because if you are, you're likely to get data corruption on any
filesystem on the device.  Based on this, the BTRFS trace you got is
probably a result of problems with the USB device.

I reformatted the disk to ext4 on the 22nd of Jan and restored the
backed up data in full to the disk. Since then I have grepped for
‘error’ and ‘dwc_otg’ in my syslog every week, but have not seen the
errors again. I will ping an email to the list in a month or two if I am
still not seeing these.
It may have been some design flaw in the USB device that caused it to 
not handle BTRFS write patterns well.  I've seen similar behavior with 
some really cheap SATA controllers before as well.  I'd be interested to 
see if similar issues occur with the same disk hooked up to a regular 
x86 system instead of a single-board computer like the Pi.



Jan 20 09:43:18 mail kernel: [2762823.997601] BTRFS: error (device
sda1) in btrfs_commit_transaction:2068: errno=-5 IO failure (Error
while writing out transaction)
Jan 20 09:43:18 mail kernel: [2762824.011517] BTRFS info (device
sda1): forced readonly
Jan 20 09:43:18 mail kernel: [2762824.011537] BTRFS warning (device
sda1): Skipping commit of aborted transaction.
Jan 20 09:43:18 mail kernel: [2762824.011576] [ cut here
]
Jan 20 09:43:18 mail kernel: [2762824.011682] WARNING: CPU: 0 PID:
1318 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0xd8/0x128
[btrfs]()
Jan 20 09:43:18 mail kernel: [2762824.011709] BTRFS: Transaction
aborted (error -5)
Jan 20 09:43:18 mail kernel: [2762824.011717] Modules linked in:
cfg80211 rfkill snd_bcm2835 snd_pcm snd_seq snd_seq_device snd_timer
snd btrfs xor xor_neon raid6_pq zlib_deflate sg bcm2835_gpiomem
uio_pdrv_genirq uio
Jan 20 09:43:18 mail kernel: [2762824.011790] CPU: 0 PID: 1318 Comm:
btrfs-transacti Not tainted 4.1.7-v7+ #817
Jan 20 09:43:18 mail kernel: [2762824.011797] Hardware name: BCM2709
Jan 20 09:43:18 mail kernel: [2762824.011832] [<80018440>]
(unwind_backtrace) from [<80013e0c>] (show_stack+0x20/0x24)
Jan 20 

4.4.0 - no space left with >1.7 TB free space left

2016-02-08 Thread Tomasz Chmielewski
Linux 4.4.0 - btrfs is mainly used to host lots of test containers, 
often snapshots, and at times, there is heavy IO in many of them for 
extended periods of time. btrfs is on HDDs.



Every few days I'm getting "no space left" in a container running mongo 
3.2.1 database. Interestingly, haven't seen this issue in containers 
with MySQL. All databases have chattr +C set on their directories.


Why would it fail, if there is so much space left?


2016-02-07T06:06:14.648+ E STORAGE  [thread1] WiredTiger (28) 
[1454825174:633585][9105:0x7f2b7e33e700], 
file:collection-33-7895599108848542105.wt, WT_SESSION.checkpoint: 
collection-33-7895599108848542105.wt write error: failed to write 4096 
bytes at offset 20480: No space left on device
2016-02-07T06:06:14.648+ E STORAGE  [thread1] WiredTiger (28) 
[1454825174:648740][9105:0x7f2b7e33e700], checkpoint-server: checkpoint 
server error: No space left on device
2016-02-07T06:06:14.648+ E STORAGE  [thread1] WiredTiger (-31804) 
[1454825174:648766][9105:0x7f2b7e33e700], checkpoint-server: the process 
must exit and restart: WT_PANIC: WiredTiger library panic

2016-02-07T06:06:14.648+ I -[thread1] Fatal Assertion 28558
2016-02-07T06:06:14.648+ I -[thread1]

***aborting after fassert() failure


2016-02-07T06:06:14.694+ I -[WTJournalFlusher] Fatal 
Assertion 28559

2016-02-07T06:06:14.694+ I -[WTJournalFlusher]

***aborting after fassert() failure


2016-02-07T06:06:15.203+ F -[WTJournalFlusher] Got signal: 6 
(Aborted).







# df -h /srv
Filesystem  Size  Used Avail Use% Mounted on
/dev/sda4   2.7T  1.1T  1.7T  39% /srv

# btrfs fi df /srv
Data, RAID1: total=1.25TiB, used=1014.01GiB
System, RAID1: total=32.00MiB, used=240.00KiB
Metadata, RAID1: total=15.00GiB, used=13.13GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

# btrfs fi show /srv
Label: 'btrfs'  uuid: 105b2e0c-8af2-45ee-b4c8-14ff0a3ca899
Total devices 2 FS bytes used 1.00TiB
devid1 size 2.63TiB used 1.26TiB path /dev/sda4
devid2 size 2.63TiB used 1.26TiB path /dev/sdb4

btrfs-progs v4.0.1



Tomasz Chmielewski
http://wpkg.org

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.4.0 - no space left with >1.7 TB free space left

2016-02-08 Thread Tomasz Chmielewski

On 2016-02-08 20:24, Roman Mamedov wrote:

On Mon, 08 Feb 2016 18:22:34 +0900
Tomasz Chmielewski  wrote:


Linux 4.4.0 - btrfs is mainly used to host lots of test containers,
often snapshots, and at times, there is heavy IO in many of them for
extended periods of time. btrfs is on HDDs.


Every few days I'm getting "no space left" in a container running 
mongo

3.2.1 database. Interestingly, haven't seen this issue in containers
with MySQL. All databases have chattr +C set on their directories.


Hello,

Do you snapshot the parent subvolume which holds the databases? Can you
correlate that perhaps ENOSPC occurs at the time of snapshotting?


Not sure.

With the last error, a snapshot was made at around 06:06, while "no 
space left" was reported on 06:14. Suspiciously close to each other, but 
still, a few minutes away.


Unfortunately I don't have error log for previous cases.



If yes, then
you should try the patch https://patchwork.kernel.org/patch/7967161/

(Too bad this was not included into 4.4.1.)


I'll keep an eye on it, thanks.


Tomasz Chmielewski
http://www.ptraveler.com

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.4.0 - no space left with >1.7 TB free space left

2016-02-08 Thread Roman Mamedov
On Mon, 08 Feb 2016 21:15:38 +0900
Tomasz Chmielewski  wrote:

> With the last error, a snapshot was made at around 06:06
> "no space left" was reported on 06:14.

If you mean the log that you have posted in your original message, the ENOSPC
happened at 06:06 and 14 seconds, not 06:14.

-- 
With respect,
Roman


pgpGvudF_ZoCz.pgp
Description: OpenPGP digital signature


Re: Use fast device only for metadata?

2016-02-08 Thread Austin S. Hemmelgarn

On 2016-02-07 15:59, Martin Steigerwald wrote:

Am Sonntag, 7. Februar 2016, 21:07:13 CET schrieb Kai Krakow:

Am Sun, 07 Feb 2016 11:06:58 -0800

schrieb Nikolaus Rath :

Hello,

I have a large home directory on a spinning disk that I regularly
synchronize between different computers using unison. That takes ages,
even though the amount of changed files is typically small. I suspect
most if the time is spend walking through the file system and checking
mtimes.

So I was wondering if I could possibly speed-up this operation by
storing all btrfs metadata on a fast, SSD drive. It seems that
mkfs.btrfs allows me to put the metadata in raid1 or dup mode, and the
file contents in single mode. However, I could not find a way to tell
btrfs to use a device *only* for metadata. Is there a way to do that?

Also, what is the difference between using "dup" and "raid1" for the
metadata?


You may want to try bcache. It will speedup random access which is
probably the main cause for your slow sync. Unfortunately it requires
you to reformat your btrfs partitions to add a bcache superblock. But
it's worth the efforts.

I use a nightly rsync to USB3 disk, and bcache reduced it from 5+ hours
to typically 1.5-3 depending on how much data changed.


An alternative is using dm-cache, I think it doesn´t need to recreate the
filesystem.
That's correct, dm-cache can use a regular underlying storage device. 
This of course has potential implications for a multi-device filesystem 
(it can seriously confuse BTRFS and cause data corruption), but it works 
just fine for a single device filesystem.  This makes it a bit easier to 
test run, but also means you need more devices (internally, it uses 3, 
one backing device, one cache device, and a metadata device for 
persistently mapping between the two).  It's really easy to set up 
though if you have a recent version of LVM built with dm-cache support.


In general, bcache takes a bit more setup, but avoids the multi-device 
issues, and importantly, doesn't require LVM or dmsetup (which are 
usually pretty big packages on many distros).  The caveat with bcache 
though is that there have been issues in the past with data integrity 
when used with BTRFS, but if you're on a recent kernel (at least 4.0 if 
you're using BTRFS for actual data storage), you should have no issues.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: Unmountable fs after power outage

2016-02-08 Thread Radek Sprta
2016-01-22 16:25 GMT+01:00 Hugo Mills :
>Try mounting with -orecovery. That's the main approach for dealing
> with transid failures.

Tried that, but I still get the same error.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.4.0 - no space left with >1.7 TB free space left

2016-02-08 Thread Roman Mamedov
On Mon, 08 Feb 2016 18:22:34 +0900
Tomasz Chmielewski  wrote:

> Linux 4.4.0 - btrfs is mainly used to host lots of test containers, 
> often snapshots, and at times, there is heavy IO in many of them for 
> extended periods of time. btrfs is on HDDs.
> 
> 
> Every few days I'm getting "no space left" in a container running mongo 
> 3.2.1 database. Interestingly, haven't seen this issue in containers 
> with MySQL. All databases have chattr +C set on their directories.

Hello,

Do you snapshot the parent subvolume which holds the databases? Can you
correlate that perhaps ENOSPC occurs at the time of snapshotting? If yes, then
you should try the patch https://patchwork.kernel.org/patch/7967161/

(Too bad this was not included into 4.4.1.)

-- 
With respect,
Roman


pgpQUIIInnFRo.pgp
Description: OpenPGP digital signature


Re: "layout" of a six drive raid10

2016-02-08 Thread Kai Krakow
Am Tue, 9 Feb 2016 01:42:40 + (UTC)
schrieb Duncan <1i5t5.dun...@cox.net>:

> Tho I'd consider benchmarking or testing, as I'm not sure btrfs raid1
> on spinning rust will in practice fully saturate the gigabit
> Ethernet, particularly as it gets fragmented (which COW filesystems
> such as btrfs tend to do much more so than non-COW, unless you're
> using something like the autodefrag mount option from the get-go, as
> I do here, tho in that case, striping won't necessarily help a lot
> either).
> 
> If you're concerned about getting the last bit of performance
> possible, I'd say raid10, tho over the gigabit ethernet, the
> difference isn't likely to be much.

If performance is an issue, I suggest putting an SSD and bcache into
the equation. I have very nice performance improvements with that,
especially with writeback caching (random write go to bcache first,
then to harddisk in background idle time).

Apparently, afaik it's currently not possible to have native bcache
redundandancy yet - so bcache can only be one SSD. It may be possible
to use two bcaches and assign the btrfs members alternating to it - tho
btrfs may decide to put two mirrors on the same bcache then. On the
other side, you could put bcache on lvm oder mdraid - but I would not
do it. On the bcache list, multiple people had problems with that
including btrfs corruption beyond repair.

On the other hand, you could simply go with bcache writearound caching
(only reads become cached) or writethrough caching (writes go in
parallel to bcache and btrfs). If the SSD dies, btrfs will still be
perfectly safe in this case.

If you are going with one of the latter options, the tuning knobs of
bcache may help you actually cache not only random accesses to bcache
but also linear accesses. It should help to saturate a gigabit link.

Currently, SANdisk offers a pretty cheap (not top performance) drive
with 500GB which should perfectly cover this usecase. Tho, I'm not sure
how stable this drive works with bcache. I only checked Crucial MX100
and Samsung Evo 840 yet - both working very stable with latest kernel
and discard enabled, no mdraid or lvm involved.

-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4.1 00/23] xfstests: test the nfs/cifs/btrfs/xfs reflink/dedupe ioctls

2016-02-08 Thread Dave Chinner
On Mon, Feb 08, 2016 at 05:11:45PM -0800, Darrick J. Wong wrote:
> Happy New Year!
> 
> Dave Chinner: I've renumbered the new tests and pushed to github[3] if
> you'd like to pull.

Can you include the commit ID I should see at the head of the
tree so I can confirm I'm pulling the right branch?

BTW, git doesn't like this:

https://github.com/djwong/xfstests/tree/for-dave

What git really wants is the tree url with a separate branch name
like so:

https://github.com/djwong/xfstests.git for-dave

(i.e. the typical output from a git request-pull command)

> This is a (no longer) small patch set against the reflink/dedupe test
> cases in xfstests.  The first four patches fix errors in the existing
> reflink tests, some of which are from Christoph Hellwig.
> 
> Patches 5-6 refactor the dmerror code so that we can use it to
> simulate transient IO errors, then use this code to test that
> unwritten extent conversion does NOT happen after a directio write to
> an unwritten extent hits a disk error.   Due to a bug in the VFS
> directio code, ext4 can disclose stale disk contents if an aio dio
> write fails; XFS suffers this problem for any failing dio write to an
> unwritten extent.  Christoph's kernel patchset titled "vfs/xfs:
> directio updates to ease COW handling V2" (and a separate ext4 warning
> cleanup) is needed to fix this.
> 
> Patches 7-9, 13, 15, 17, 18, 20, 21, and 23 exercise various parts
> of the copy on write behavior that are necessary to support shared
> blocks.  The earlier patches focus on correct CoW behavior in the
> presence of IO errors during the copy-write, and the later patches
> focus on XFS' new cow-extent-size hint that greatly reduces
> fragmentation due to copy on write behavior by encouraging the
> allocator to allocate larger extents of replacement blocks.
> 
> Patches 10-12 and 14 perform stress testing on reflink and CoW to
> check the behaviors when we get close to maximum refcount, when we
> specify obnxiously large offsets and lengths, and when we try to
> reflink millions of extents at a time.
> 
> Patch 16 tests quota accounting behavior when reflink is enabled.
> 
> Patch 19 adds a few tests for the XFS reverse mapping btree to ensure
> that things like metadump and growfs work correctly.
> 
> Patch 22 checks that get_bmapx and fiemap (on XFS) correctly flag
> extents as having shared blocks.  XFS now follows btrfs and ocfs2
> FIEMAP behavior such that if any blocks of a file's extent are shared,
> the whole extent is marked shared.  This is in contrast to earlier
> XFS-only behavior that reported shared and non-shared regions as
> separate extents.

This may change - xfs_bmap doesn't combine extents in it's output
even if they are adjacent. For debugging purposes (which is what
xfs_bmap/fiemap is for), it's much better to be able to see the
exact extent layout and block sharing.

I suspect the solution of least surprise is to make fiemap behave
like the other filesystems, and make xfs_bmap behave in a manner
that is useful to us :P

> If you're going to start using this mess, you probably ought to just
> pull from my github trees for kernel[1], xfsprogs[2], xfstests[3],
> xfs-docs[4], and man-pages[5].  All tests should pass on XFS.   I
> tried btrfs this weekend and it failed 166, 175, 182, 266, 271, 272,
> 278, 281, 297, 298, 304, 333, and 334.  ocfs2 (when I jury-rigged it
> to run the cp_reflink tests) seemed to have a quota bug and crashes
> hard in 284 (but was otherwise fine).

Fun fun fun. I'll look through the patchs, and if there's nothing
major I'll pull it in once I get a commit ID from you.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4.1 00/23] xfstests: test the nfs/cifs/btrfs/xfs reflink/dedupe ioctls

2016-02-08 Thread Darrick J. Wong
Aha, /me finds git request-pull.  Sorry for the noise.

--D

The following changes since commit d98149c205559950c03d6b1d539e45fd35b5630e:

  Fix prerequisite packages to build fstests on Ubuntu (2016-02-08 09:27:15 
+1100)

are available in the git repository at:

  https://github.com/djwong/xfstests for-dave

for you to fetch changes up to 9799e5c5397b7aa14dbc660645ef4ccaf5418c78:

  reflink: test reflink+cow+enospc all at the same time (2016-02-08 17:07:36 
-0800)


Darrick J. Wong (23):
  generic/182: this is a dedupe test, check for dedupe
  xfstests: filter whitespace in 128 and 132
  xfstests: make _scratch_mkfs_blocksized usable
  reflink: remove redundant filesystem checks from the end of the tests
  common/dmerror: add some more dmerror routines
  dio unwritten conversion bug tests
  reflink: test intersecting CoW and 
falloc/fpunch/fzero/fcollapse/finsert/ftrunc
  reflink: test CoW behavior with IO errors
  reflink: test CoW operations against the source file
  xfs: more reflink tests
  reflink: ensure that we can handle reflinking a lot of extents
  xfs/122: support refcount/rmap data structures
  xfs: test fragmentation characteristics of copy-on-write
  reflink: high offset reflink and dedupe tests
  reflink: test xfs cow behavior when the filesystem crashes
  reflink: test quota accounting
  reflink: test CoW across a mixed range of block types with cowextsize set
  xfs: test the automatic cowextsize extent garbage collector
  xfs: test rmapbt functionality
  reflink: test aio copy on write
  xfs: aio cow tests
  xfs: test xfs_getbmapx behavior with shared extents
  reflink: test reflink+cow+enospc all at the same time

 .gitignore  |   2 +
 common/dmerror  |  27 ++-
 common/rc   |  52 +
 common/reflink  |  32 +--
 common/xfs  |  63 ++
 src/Makefile|   2 +-
 src/aio-dio-regress/aiocp.c | 489 
 src/punch-alternating.c |  59 ++
 tests/btrfs/100 |   2 +-
 tests/btrfs/101 |   2 +-
 tests/generic/157   |   1 -
 tests/generic/158   |   1 -
 tests/generic/161   |   1 -
 tests/generic/162   |   1 -
 tests/generic/163   |   1 -
 tests/generic/164   |   1 -
 tests/generic/165   |   1 -
 tests/generic/166   |   7 +-
 tests/generic/167   |   7 +-
 tests/generic/168   |   1 -
 tests/generic/170   |   1 -
 tests/generic/171   |   1 -
 tests/generic/172   |   1 -
 tests/generic/173   |   1 -
 tests/generic/174   |   1 -
 tests/generic/175   |  43 ++--
 tests/generic/175.out   |   6 +
 tests/generic/176   |  51 +++--
 tests/generic/176.out   |   4 +-
 tests/generic/182   |   9 +-
 tests/generic/183   |   1 -
 tests/generic/185   |   1 -
 tests/generic/186   |   1 -
 tests/generic/187   |   1 -
 tests/generic/188   |   1 -
 tests/generic/189   |   1 -
 tests/generic/190   |   1 -
 tests/generic/191   |   1 -
 tests/generic/194   |   1 -
 tests/generic/195   |   1 -
 tests/generic/196   |   3 +-
 tests/generic/197   |   3 +-
 tests/generic/199   |   1 -
 tests/generic/200   |   1 -
 tests/generic/201   |   1 -
 tests/generic/202   |   1 -
 tests/generic/203   |   1 -
 tests/generic/205   |   1 -
 tests/generic/206   |   1 -
 tests/generic/216   |   1 -
 tests/generic/217   |   1 -
 tests/generic/218   |   1 -
 tests/generic/220   |   1 -
 tests/generic/222   |   1 -
 tests/generic/227   |   1 -
 tests/generic/229   |   1 -
 tests/generic/238   |   1 -
 tests/generic/242   |   1 -
 tests/generic/243   |   1 -
 tests/generic/250   | 104 ++
 tests/generic/250.out   |  10 +
 tests/generic/252   | 107 ++
 tests/generic/252.out   |  10 +
 tests/generic/253   |  93 +
 tests/generic/253.out   |  13 ++
 tests/generic/254   |  93 +
 tests/generic/254.out   |  13 ++
 tests/generic/259   |  93 +
 tests/generic/259.out   |  13 ++
 tests/generic/261   |  93 +
 tests/generic/261.out   |  13 ++
 tests/generic/262   |  96 +
 tests/generic/262.out   |  13 ++
 tests/generic/264   |  93 +
 tests/generic/264.out   |  13 ++
 tests/generic/265   | 102 +
 tests/generic/265.out   |  11 +
 tests/generic/266   | 103 ++
 tests/generic/266.out   |  12 ++
 tests/generic/267   | 103 ++
 tests/generic/267.out   |  10 +
 

Re: [PATCH 12/23] xfs/122: support refcount/rmap data structures

2016-02-08 Thread Dave Chinner
On Mon, Feb 08, 2016 at 05:13:03PM -0800, Darrick J. Wong wrote:
> Include the refcount and rmap structures in the golden output.
> 
> Signed-off-by: Darrick J. Wong 
> ---
>  tests/xfs/122 |3 +++
>  tests/xfs/122.out |4 
>  tests/xfs/group   |2 +-
>  3 files changed, 8 insertions(+), 1 deletion(-)
> 
> 
> diff --git a/tests/xfs/122 b/tests/xfs/122
> index e6697a2..758cb50 100755
> --- a/tests/xfs/122
> +++ b/tests/xfs/122
> @@ -90,6 +90,9 @@ xfs_da3_icnode_hdr
>  xfs_dir3_icfree_hdr
>  xfs_dir3_icleaf_hdr
>  xfs_name
> +xfs_owner_info
> +xfs_refcount_irec
> +xfs_rmap_irec
>  xfs_alloctype_t
>  xfs_buf_cancel_t
>  xfs_bmbt_rec_32_t

So this is going to cause failures on any userspace that doesn't
know about these new types, right?

Should these be conditional in some way?

Cheers,

Dave.

-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/23] xfs: more reflink tests

2016-02-08 Thread Dave Chinner
On Mon, Feb 08, 2016 at 05:12:50PM -0800, Darrick J. Wong wrote:
> Create a couple of XFS-specific tests -- one to check that growing
> and shrinking the refcount btree works and a second one to check
> what happens when we hit maximum refcount.
> 
> Signed-off-by: Darrick J. Wong 
.
> +# real QA test starts here
> +_supported_os Linux
> +_supported_fs xfs
> +_require_scratch_reflink
> +_require_cp_reflink

> +
> +test -x "$here/src/punch-alternating" || _notrun "punch-alternating not 
> built"

I suspect we need a _require rule for checking that something in
the test src directory has been built.

> +echo "Check scratch fs"
> +umount "$SCRATCH_MNT"
> +echo "check refcount after removing all files" >> "$seqres.full"
> +"$XFS_DB_PROG" -c 'agf 0' -c 'addr refcntroot' -c 'p recs[1]' "$SCRATCH_DEV" 
> >> "$seqres.full"
> +"$XFS_REPAIR_PROG" -o force_geometry -n "$SCRATCH_DEV" >> "$seqres.full" 2>&1
> +res=$?
> +if [ $res -eq 0 ]; then
> + # If repair succeeds then format the device so that the post-test
> + # check doesn't fail due to the single AG.
> + _scratch_mkfs >> "$seqres.full" 2>&1
> +else
> + _fail "xfs_repair fails"
> +fi
> +
> +# success, all done
> +status=0
> +exit

This is what _require_scratch_nocheck avoids.

i.e. do this instead:

_require_scratch_nocheck
.

"$XFS_REPAIR_PROG" -o force_geometry -n "$SCRATCH_DEV" >> "$seqres.full" 2>&1 
status=$?
exit

Also, we really don't need the quotes around these global
variables.  They are just noise and lots of stuff will break if
those variables are set to something that requires them to be
quoted.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/23] dio unwritten conversion bug tests

2016-02-08 Thread Dave Chinner
On Mon, Feb 08, 2016 at 05:12:23PM -0800, Darrick J. Wong wrote:
> Check that we don't expose old disk contents when a directio write to
> an unwritten extent fails due to IO errors.  This primarily affects
> XFS and ext4.
> 
> Signed-off-by: Darrick J. Wong 
.
> --- a/tests/generic/group
> +++ b/tests/generic/group
> @@ -252,7 +252,9 @@
>  247 auto quick rw
>  248 auto quick rw
>  249 auto quick rw
> +250 auto quick
>  251 ioctl trim
> +252 auto quick

Also should be in the prealloc group if we are testing unwritten
extent behaviour and the rw group because it's testing IO.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "layout" of a six drive raid10

2016-02-08 Thread Duncan
boli posted on Mon, 08 Feb 2016 23:19:52 +0100 as excerpted:

> Hi
> 
> I'm trying to figure out what a six drive btrfs raid10 would look like.

> It could mean that stripes are split over two raid1 sets of three
> devices each. The sentence "Every stripe is split across to exactly 2
> RAID-1 sets" would lead me to believe this.
> 
> However, earlier it says for raid0 that "stripe[s are] split across as
> many devices as possible". Which for six drives would be: stripes are
> split over three raid1 sets of two devices each.
> 
> Can anyone enlighten me as to which is correct?

Hugo's correct, and this is pretty much restating what he did.  Sometimes 
I find that reading things again in different words helps me better 
understand the concept, and this post is made with that in mind.

At present, btrfs has only two-way mirroring, not N-way.  So any raid 
level that includes mirroring will have exactly two copies, no matter the 
number of devices.  (FWIW, N-way-mirroring is on the roadmap, but who 
knows when it'll come, and like raid56 mode, it will likely take some 
time to stabilize even once it does.)

What that means for a six device raid1 or raid10 is, still exactly two 
copies of everything, with raid1 simply being three independent chunks, 
two copies each, and raid10 being two copies of a three-device stripe.

> Reason I'm asking is that I'm deciding on a suitable raid level for a
> new DIY NAS box. I'd rather not use btrfs raid6 (for now).

Agreed and I think wise choice. =:^)  I'd still be a bit cautious of 
btrfs raid56, as I don't think it's quite to the level of stability that 
other btrfs raid types are, just yet.  I expect to be much more 
comfortable recommending it in another couple kernel cycles.

> The first
> alternative I thought of was raid10. Later I learned how btrfs raid1
> works and figured it might be better suited for my use case: Striping
> the data over multiple raid1 sets doesn't really help, as transfer
> from/to my box will be limited by gigabit ethernet anyway, and a single
> drive can saturate that.
> 
> Thoughts on this would also be appreciated.

Agreed, again. =:^)

Tho I'd consider benchmarking or testing, as I'm not sure btrfs raid1 on 
spinning rust will in practice fully saturate the gigabit Ethernet, 
particularly as it gets fragmented (which COW filesystems such as btrfs 
tend to do much more so than non-COW, unless you're using something like 
the autodefrag mount option from the get-go, as I do here, tho in that 
case, striping won't necessarily help a lot either).

If you're concerned about getting the last bit of performance possible, 
I'd say raid10, tho over the gigabit ethernet, the difference isn't 
likely to be much.

OTOH, if you're more concerned about ease of maintenance, replacing 
devices, etc, I believe raid1 is a bit less complex both in code terms 
(where less code complexity means less chance of bugs) and in 
administration, at least conceptually, tho in practice the administration 
is going to be very close to the same as well.

So I'd tend to lean toward raid1 for a use-case thruput limited to gitabit 
Ethernet speeds, even on spinning rust, as I think there may be a bit of 
a difference in speed vs raid10, but I doubt it'll be much due to the 
gigabit thruput limit, and I'd consider the lower complexity of raid1 to 
offset that.

> As a bonus I was wondering how btrfs raid1 are layed out in general, in
> particular with even and odd numbers of drives. A pair is trivial. For
> three drives I think a "ring setup" with each drive sharing half of its
> data with another drive. But how is it with four drives – are they
> organized as two pairs, or four-way, or …

For raid1, allocation is done in pairs, with each allocation taking the 
device with the most space left, except that both copies can't be on a 
single device, even if for instance you have a 3 TB device and the rest 
are 1 TB or smaller.  That case would result in one copy of each pair on 
the 3 TB device, one copy on whatever device has the most space left of 
the others.

Which on a filesystem with all equal sized devices, tends to result in 
round-robin allocation, tho of course in the odd number of devices case, 
there will always be at least one device that has either more or less 
allocation by a one-chunk margin.  (Tho it can be noted that metadata 
chunks are smaller than data chunks, and while Hugo noted the nominal 1 
GiB data chunk size and 256 MiB metadata chunk size, at the 100 GiB plus 
per device scale, chunks can be larger, upto 10 GiB data chunk, and of 
course smaller on very small devices, so the 1GiB-data/256MiB-metadata 
values are indeed only nominal, but they still give you some idea of the 
relative size.)

So a btrfs raid1 on four equally sized devices will indeed result in two 
pairs, but simply because of the most-space-available allocation rule, 
not because it's forced to pairs of pairs.  And with unequally sized 
devices, the device with the most space will 

Re: Use fast device only for metadata?

2016-02-08 Thread Kai Krakow
Am Mon, 08 Feb 2016 13:44:17 -0800
schrieb Nikolaus Rath :

> On Feb 07 2016, Martin Steigerwald  wrote:
> > Am Sonntag, 7. Februar 2016, 21:07:13 CET schrieb Kai Krakow:
> >> Am Sun, 07 Feb 2016 11:06:58 -0800
> >> 
> >> schrieb Nikolaus Rath :
> >> > Hello,
> >> > 
> >> > I have a large home directory on a spinning disk that I regularly
> >> > synchronize between different computers using unison. That takes
> >> > ages, even though the amount of changed files is typically
> >> > small. I suspect most if the time is spend walking through the
> >> > file system and checking mtimes.
> >> > 
> >> > So I was wondering if I could possibly speed-up this operation by
> >> > storing all btrfs metadata on a fast, SSD drive. It seems that
> >> > mkfs.btrfs allows me to put the metadata in raid1 or dup mode,
> >> > and the file contents in single mode. However, I could not find
> >> > a way to tell btrfs to use a device *only* for metadata. Is
> >> > there a way to do that?
> >> > 
> >> > Also, what is the difference between using "dup" and "raid1" for
> >> > the metadata?
> >> 
> >> You may want to try bcache. It will speedup random access which is
> >> probably the main cause for your slow sync. Unfortunately it
> >> requires you to reformat your btrfs partitions to add a bcache
> >> superblock. But it's worth the efforts.
> >> 
> >> I use a nightly rsync to USB3 disk, and bcache reduced it from 5+
> >> hours to typically 1.5-3 depending on how much data changed.
> >
> > An alternative is using dm-cache, I think it doesn´t need to
> > recreate the filesystem.
> 
> Yes, I tried that already but it didn't improve things at all. I
> wrote a message to the lvm list though, so maybe someone will be able
> to help.
> 
> Otherwise I'll give bcache a shot. I've avoided it so far because of
> the need to reformat and because of rumours that it doesn't work well
> with LVM or BTRFS. But it sounds as if that's not the case..

I'm myself using bcache+btrfs and it ran bullet proof so far, even
after unintentional resets or power outage. It's important tho to NOT
put any storage layer between bcache and your devices or between btrfs
and your device as there are reports it becomes unstable with md or lvm
involved. In my setup I can even use discard/trim without problems. I'd
recommend a current kernel, tho.

Since it requires reformatting, it's a big pita but it's worth the
efforts. It appeared, from its design, much more effective and stable
than dmcache. You could even format a bcache superblock "just in case",
and add an SSD later. Without SSD, bcache will just work in passthru
mode. Actually, I started to format all my storage with bcache
superblock "just in case". It is similar to having another partition
table folded inside - so it doesn't hurt (except you need bcache-probe
in initrd to detect the contained filesystems).

-- 
Regards,
Kai

Replies to list-only preferred.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4.1 00/23] xfstests: test the nfs/cifs/btrfs/xfs reflink/dedupe ioctls

2016-02-08 Thread Darrick J. Wong
On Tue, Feb 09, 2016 at 06:21:06PM +1100, Dave Chinner wrote:
> On Mon, Feb 08, 2016 at 05:11:45PM -0800, Darrick J. Wong wrote:
> > Happy New Year!
> > 
> > Dave Chinner: I've renumbered the new tests and pushed to github[3] if
> > you'd like to pull.
> 
> Can you include the commit ID I should see at the head of the
> tree so I can confirm I'm pulling the right branch?

Heh, surprisingly, I've never ever sent a pull request to anyone, anywhere. :)

HEAD is 9799e5c5397b7aa14dbc660645ef4ccaf5418c78

> BTW, git doesn't like this:
> 
> https://github.com/djwong/xfstests/tree/for-dave
> 
> What git really wants is the tree url with a separate branch name
> like so:
> 
> https://github.com/djwong/xfstests.git for-dave
> 
> (i.e. the typical output from a git request-pull command)
> 
> > This is a (no longer) small patch set against the reflink/dedupe test
> > cases in xfstests.  The first four patches fix errors in the existing
> > reflink tests, some of which are from Christoph Hellwig.
> > 
> > Patches 5-6 refactor the dmerror code so that we can use it to
> > simulate transient IO errors, then use this code to test that
> > unwritten extent conversion does NOT happen after a directio write to
> > an unwritten extent hits a disk error.   Due to a bug in the VFS
> > directio code, ext4 can disclose stale disk contents if an aio dio
> > write fails; XFS suffers this problem for any failing dio write to an
> > unwritten extent.  Christoph's kernel patchset titled "vfs/xfs:
> > directio updates to ease COW handling V2" (and a separate ext4 warning
> > cleanup) is needed to fix this.
> > 
> > Patches 7-9, 13, 15, 17, 18, 20, 21, and 23 exercise various parts
> > of the copy on write behavior that are necessary to support shared
> > blocks.  The earlier patches focus on correct CoW behavior in the
> > presence of IO errors during the copy-write, and the later patches
> > focus on XFS' new cow-extent-size hint that greatly reduces
> > fragmentation due to copy on write behavior by encouraging the
> > allocator to allocate larger extents of replacement blocks.
> > 
> > Patches 10-12 and 14 perform stress testing on reflink and CoW to
> > check the behaviors when we get close to maximum refcount, when we
> > specify obnxiously large offsets and lengths, and when we try to
> > reflink millions of extents at a time.
> > 
> > Patch 16 tests quota accounting behavior when reflink is enabled.
> > 
> > Patch 19 adds a few tests for the XFS reverse mapping btree to ensure
> > that things like metadump and growfs work correctly.
> > 
> > Patch 22 checks that get_bmapx and fiemap (on XFS) correctly flag
> > extents as having shared blocks.  XFS now follows btrfs and ocfs2
> > FIEMAP behavior such that if any blocks of a file's extent are shared,
> > the whole extent is marked shared.  This is in contrast to earlier
> > XFS-only behavior that reported shared and non-shared regions as
> > separate extents.
> 
> This may change - xfs_bmap doesn't combine extents in it's output
> even if they are adjacent. For debugging purposes (which is what
> xfs_bmap/fiemap is for), it's much better to be able to see the
> exact extent layout and block sharing.
> 
> I suspect the solution of least surprise is to make fiemap behave
> like the other filesystems, and make xfs_bmap behave in a manner
> that is useful to us :P

Hehe.  Well... FIEMAP now /does/ act like the other filesystems.

But perhaps we can do better with getbmapx and show the exact
shared regions.  I thought about adding a flag for that, but...

> > If you're going to start using this mess, you probably ought to just
> > pull from my github trees for kernel[1], xfsprogs[2], xfstests[3],
> > xfs-docs[4], and man-pages[5].  All tests should pass on XFS.   I
> > tried btrfs this weekend and it failed 166, 175, 182, 266, 271, 272,
> > 278, 281, 297, 298, 304, 333, and 334.  ocfs2 (when I jury-rigged it
> > to run the cp_reflink tests) seemed to have a quota bug and crashes
> > hard in 284 (but was otherwise fine).
> 
> Fun fun fun. I'll look through the patchs, and if there's nothing
> major I'll pull it in once I get a commit ID from you.

:)

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "layout" of a six drive raid10

2016-02-08 Thread Kai Krakow
Am Tue, 9 Feb 2016 08:02:58 +0100
schrieb Kai Krakow :

> Am Tue, 9 Feb 2016 01:42:40 + (UTC)
> schrieb Duncan <1i5t5.dun...@cox.net>:
> 
> > Tho I'd consider benchmarking or testing, as I'm not sure btrfs
> > raid1 on spinning rust will in practice fully saturate the gigabit
> > Ethernet, particularly as it gets fragmented (which COW filesystems
> > such as btrfs tend to do much more so than non-COW, unless you're
> > using something like the autodefrag mount option from the get-go, as
> > I do here, tho in that case, striping won't necessarily help a lot
> > either).
> > 
> > If you're concerned about getting the last bit of performance
> > possible, I'd say raid10, tho over the gigabit ethernet, the
> > difference isn't likely to be much.
> 
> If performance is an issue, I suggest putting an SSD and bcache into
> the equation. I have very nice performance improvements with that,
> especially with writeback caching (random write go to bcache first,
> then to harddisk in background idle time).
> 
> Apparently, afaik it's currently not possible to have native bcache
> redundandancy yet - so bcache can only be one SSD. It may be possible
> to use two bcaches and assign the btrfs members alternating to it -
> tho btrfs may decide to put two mirrors on the same bcache then. On
> the other side, you could put bcache on lvm oder mdraid - but I would
> not do it. On the bcache list, multiple people had problems with that
> including btrfs corruption beyond repair.
> 
> On the other hand, you could simply go with bcache writearound caching
> (only reads become cached) or writethrough caching (writes go in
> parallel to bcache and btrfs). If the SSD dies, btrfs will still be
> perfectly safe in this case.
> 
> If you are going with one of the latter options, the tuning knobs of
> bcache may help you actually cache not only random accesses to bcache
> but also linear accesses. It should help to saturate a gigabit link.
> 
> Currently, SANdisk offers a pretty cheap (not top performance) drive
> with 500GB which should perfectly cover this usecase. Tho, I'm not
> sure how stable this drive works with bcache. I only checked Crucial
> MX100 and Samsung Evo 840 yet - both working very stable with latest
> kernel and discard enabled, no mdraid or lvm involved.

BTW: If you are thinking about adding bcache later keep in mind that it
is almost impossible to do that (requires reformatting) as bcache needs
to add its own superblock to the backing storage devices (spinning
rust). But it's perfectly okay to format with a bcache superblock even
if you do not use bcache caching with SSD yet. It will work in passthru
mode until you add the SSD later so it may be worth starting with a
bcache superblock right from the beginning. It creates a sub device
like this:

/dev/sda [spinning disk]
`- /dev/bcache0
/dev/sdb [spinning disk]
`- /dev/bcache1

So, you put btrfs on /dev/bcache* then.

If you later add the caching device, it will add the following to
"lsblk":

/dev/sdc [SSD, ex. 500GB]
`- /dev/bcache0 [harddisk, ex. 2TB]
`- /dev/bcache1 [harddisk, ex. 2TB]

Access to bcache0 and bcache1 will then go thru /dev/sdc as the cache.
Bcache is very good at turning random access patterns into linear
access patterns, in turn reducing seeking noise from the harddisks to a
minimum (you will actually hear the difference). So essentially it
quite effectively reduces seeking which makes btrfs slow on spinning
rust, in turn speeding it up noticeably.

-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/23] xfs/122: support refcount/rmap data structures

2016-02-08 Thread Darrick J. Wong
On Tue, Feb 09, 2016 at 06:43:30PM +1100, Dave Chinner wrote:
> On Mon, Feb 08, 2016 at 05:13:03PM -0800, Darrick J. Wong wrote:
> > Include the refcount and rmap structures in the golden output.
> > 
> > Signed-off-by: Darrick J. Wong 
> > ---
> >  tests/xfs/122 |3 +++
> >  tests/xfs/122.out |4 
> >  tests/xfs/group   |2 +-
> >  3 files changed, 8 insertions(+), 1 deletion(-)
> > 
> > 
> > diff --git a/tests/xfs/122 b/tests/xfs/122
> > index e6697a2..758cb50 100755
> > --- a/tests/xfs/122
> > +++ b/tests/xfs/122
> > @@ -90,6 +90,9 @@ xfs_da3_icnode_hdr
> >  xfs_dir3_icfree_hdr
> >  xfs_dir3_icleaf_hdr
> >  xfs_name
> > +xfs_owner_info
> > +xfs_refcount_irec
> > +xfs_rmap_irec
> >  xfs_alloctype_t
> >  xfs_buf_cancel_t
> >  xfs_bmbt_rec_32_t
> 
> So this is going to cause failures on any userspace that doesn't
> know about these new types, right?
> 
> Should these be conditional in some way?

I wasn't sure how to handle this -- I could just keep the patch at the head of
my stack (unreleased) until xfsprogs pulls in the appropriate libxfs pieces?
So long as we're not dead certain of the final format of the rmapbt and
refcountbt, there's probably not a lot of value in putting this in (yet).

--D

> 
> Cheers,
> 
> Dave.
> 
> -- 
> Dave Chinner
> da...@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Use fast device only for metadata?

2016-02-08 Thread Qu Wenruo



On 02/08/2016 09:29 PM, Austin S. Hemmelgarn wrote:

On 2016-02-08 08:20, Qu Wenruo wrote:

On 02/08/2016 08:24 PM, Austin S. Hemmelgarn wrote:

On 2016-02-07 15:59, Martin Steigerwald wrote:

Am Sonntag, 7. Februar 2016, 21:07:13 CET schrieb Kai Krakow:

Am Sun, 07 Feb 2016 11:06:58 -0800

schrieb Nikolaus Rath :

Hello,

I have a large home directory on a spinning disk that I regularly
synchronize between different computers using unison. That takes
ages,
even though the amount of changed files is typically small. I suspect
most if the time is spend walking through the file system and
checking
mtimes.

So I was wondering if I could possibly speed-up this operation by
storing all btrfs metadata on a fast, SSD drive. It seems that
mkfs.btrfs allows me to put the metadata in raid1 or dup mode, and
the
file contents in single mode. However, I could not find a way to tell
btrfs to use a device *only* for metadata. Is there a way to do that?

Also, what is the difference between using "dup" and "raid1" for the
metadata?


You may want to try bcache. It will speedup random access which is
probably the main cause for your slow sync. Unfortunately it requires
you to reformat your btrfs partitions to add a bcache superblock. But
it's worth the efforts.

I use a nightly rsync to USB3 disk, and bcache reduced it from 5+
hours
to typically 1.5-3 depending on how much data changed.


An alternative is using dm-cache, I think it doesn´t need to recreate
the
filesystem.

That's correct, dm-cache can use a regular underlying storage device.
This of course has potential implications for a multi-device filesystem
(it can seriously confuse BTRFS and cause data corruption), but it works
just fine for a single device filesystem.  This makes it a bit easier to
test run, but also means you need more devices (internally, it uses 3,
one backing device, one cache device, and a metadata device for
persistently mapping between the two).  It's really easy to set up
though if you have a recent version of LVM built with dm-cache support.

In general, bcache takes a bit more setup, but avoids the multi-device
issues, and importantly, doesn't require LVM or dmsetup (which are
usually pretty big packages on many distros).  The caveat with bcache
though is that there have been issues in the past with data integrity
when used with BTRFS, but if you're on a recent kernel (at least 4.0 if
you're using BTRFS for actual data storage), you should have no issues.


And I just want to add more about using a device *only* for metadata.

The short answer is, unfortunately, NO.

1) Even using bcache/dm-cache, it may still cache small data write

Although I'm not quite sure about dm-cache/bcache, but as long as the
top file is Btrfs, it won't be possible to limit data/metadata to/from
specific device.

IIRC, bcache or similiar method may cache most random r/w of metadata,
it's still quite possible to cache a lot of random r/w of data.

And depending on the sector size(minimal data block size) and leaf size
(metadata block size), it's even more possible to cache small data other
than metadata under specific worload.
As default sectorsize is 4K, but leafsize is 16K.

The mention of dm-cache/bcache was more intended as an alternative,
since BTRFS currently can't do what Nikolaus was trying to achieve.
Neither will give quite the performance profile that a dedicated
metadata device might, but they should still significantly improve
general performance.  In essence, these function for BTRFS like L2ARC on
an SSD does for ZFS.


2) Btrfs don't have special preference on chunk allocation.

Btrfs just allocate chunks in the order of unallocated space.
So, even there is a super big TB or PB spinning device, and GB level
SSD, btrfs will just trust them according to unallocated space.

On at least the project page, there is a suggestion to provide this
functionality.  In a way, it's essentially equivalent to the external
journal device supported by ext4, XFS, OCFS2 and some other filesystems,
and as such, I'd say it's a feature we should seriously consider looking
at implementing eventually, even if just for feature parity, and even if
we speed up metadata operations in BTRFS.


Yes, that's quite a good feature, not only for metadata speedup, but 
also for better metadata safety.


But on the other hand, I also suspect lock concurrency other than device 
speed is causing slow btrfs metadata performance.


Fortunately, that's also in the project page.
But unfortunately, it may be much harder to implement than special 
behaved chunk allocation.


Thanks,
Qu



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: Use fast device only for metadata?

2016-02-08 Thread Austin S. Hemmelgarn

On 2016-02-08 08:20, Qu Wenruo wrote:

On 02/08/2016 08:24 PM, Austin S. Hemmelgarn wrote:

On 2016-02-07 15:59, Martin Steigerwald wrote:

Am Sonntag, 7. Februar 2016, 21:07:13 CET schrieb Kai Krakow:

Am Sun, 07 Feb 2016 11:06:58 -0800

schrieb Nikolaus Rath :

Hello,

I have a large home directory on a spinning disk that I regularly
synchronize between different computers using unison. That takes ages,
even though the amount of changed files is typically small. I suspect
most if the time is spend walking through the file system and checking
mtimes.

So I was wondering if I could possibly speed-up this operation by
storing all btrfs metadata on a fast, SSD drive. It seems that
mkfs.btrfs allows me to put the metadata in raid1 or dup mode, and the
file contents in single mode. However, I could not find a way to tell
btrfs to use a device *only* for metadata. Is there a way to do that?

Also, what is the difference between using "dup" and "raid1" for the
metadata?


You may want to try bcache. It will speedup random access which is
probably the main cause for your slow sync. Unfortunately it requires
you to reformat your btrfs partitions to add a bcache superblock. But
it's worth the efforts.

I use a nightly rsync to USB3 disk, and bcache reduced it from 5+ hours
to typically 1.5-3 depending on how much data changed.


An alternative is using dm-cache, I think it doesn´t need to recreate
the
filesystem.

That's correct, dm-cache can use a regular underlying storage device.
This of course has potential implications for a multi-device filesystem
(it can seriously confuse BTRFS and cause data corruption), but it works
just fine for a single device filesystem.  This makes it a bit easier to
test run, but also means you need more devices (internally, it uses 3,
one backing device, one cache device, and a metadata device for
persistently mapping between the two).  It's really easy to set up
though if you have a recent version of LVM built with dm-cache support.

In general, bcache takes a bit more setup, but avoids the multi-device
issues, and importantly, doesn't require LVM or dmsetup (which are
usually pretty big packages on many distros).  The caveat with bcache
though is that there have been issues in the past with data integrity
when used with BTRFS, but if you're on a recent kernel (at least 4.0 if
you're using BTRFS for actual data storage), you should have no issues.


And I just want to add more about using a device *only* for metadata.

The short answer is, unfortunately, NO.

1) Even using bcache/dm-cache, it may still cache small data write

Although I'm not quite sure about dm-cache/bcache, but as long as the
top file is Btrfs, it won't be possible to limit data/metadata to/from
specific device.

IIRC, bcache or similiar method may cache most random r/w of metadata,
it's still quite possible to cache a lot of random r/w of data.

And depending on the sector size(minimal data block size) and leaf size
(metadata block size), it's even more possible to cache small data other
than metadata under specific worload.
As default sectorsize is 4K, but leafsize is 16K.
The mention of dm-cache/bcache was more intended as an alternative, 
since BTRFS currently can't do what Nikolaus was trying to achieve. 
Neither will give quite the performance profile that a dedicated 
metadata device might, but they should still significantly improve 
general performance.  In essence, these function for BTRFS like L2ARC on 
an SSD does for ZFS.


2) Btrfs don't have special preference on chunk allocation.

Btrfs just allocate chunks in the order of unallocated space.
So, even there is a super big TB or PB spinning device, and GB level
SSD, btrfs will just trust them according to unallocated space.
On at least the project page, there is a suggestion to provide this 
functionality.  In a way, it's essentially equivalent to the external 
journal device supported by ext4, XFS, OCFS2 and some other filesystems, 
and as such, I'd say it's a feature we should seriously consider looking 
at implementing eventually, even if just for feature parity, and even if 
we speed up metadata operations in BTRFS.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/10] fs: btrfs: Replace CURRENT_TIME by current_fs_time()

2016-02-08 Thread David Sterba
On Sat, Feb 06, 2016 at 11:57:21PM -0800, Deepa Dinamani wrote:
> CURRENT_TIME macro is not appropriate for filesystems as it
> doesn't use the right granularity for filesystem timestamps.
> Use current_fs_time() instead.
> 
> Signed-off-by: Deepa Dinamani 
> Cc: Chris Mason 
> Cc: Josef Bacik 
> Cc: David Sterba 
> Cc: linux-btrfs@vger.kernel.org

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Use fast device only for metadata?

2016-02-08 Thread Qu Wenruo



On 02/08/2016 08:24 PM, Austin S. Hemmelgarn wrote:

On 2016-02-07 15:59, Martin Steigerwald wrote:

Am Sonntag, 7. Februar 2016, 21:07:13 CET schrieb Kai Krakow:

Am Sun, 07 Feb 2016 11:06:58 -0800

schrieb Nikolaus Rath :

Hello,

I have a large home directory on a spinning disk that I regularly
synchronize between different computers using unison. That takes ages,
even though the amount of changed files is typically small. I suspect
most if the time is spend walking through the file system and checking
mtimes.

So I was wondering if I could possibly speed-up this operation by
storing all btrfs metadata on a fast, SSD drive. It seems that
mkfs.btrfs allows me to put the metadata in raid1 or dup mode, and the
file contents in single mode. However, I could not find a way to tell
btrfs to use a device *only* for metadata. Is there a way to do that?

Also, what is the difference between using "dup" and "raid1" for the
metadata?


You may want to try bcache. It will speedup random access which is
probably the main cause for your slow sync. Unfortunately it requires
you to reformat your btrfs partitions to add a bcache superblock. But
it's worth the efforts.

I use a nightly rsync to USB3 disk, and bcache reduced it from 5+ hours
to typically 1.5-3 depending on how much data changed.


An alternative is using dm-cache, I think it doesn´t need to recreate the
filesystem.

That's correct, dm-cache can use a regular underlying storage device.
This of course has potential implications for a multi-device filesystem
(it can seriously confuse BTRFS and cause data corruption), but it works
just fine for a single device filesystem.  This makes it a bit easier to
test run, but also means you need more devices (internally, it uses 3,
one backing device, one cache device, and a metadata device for
persistently mapping between the two).  It's really easy to set up
though if you have a recent version of LVM built with dm-cache support.

In general, bcache takes a bit more setup, but avoids the multi-device
issues, and importantly, doesn't require LVM or dmsetup (which are
usually pretty big packages on many distros).  The caveat with bcache
though is that there have been issues in the past with data integrity
when used with BTRFS, but if you're on a recent kernel (at least 4.0 if
you're using BTRFS for actual data storage), you should have no issues.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


And I just want to add more about using a device *only* for metadata.

The short answer is, unfortunately, NO.

1) Even using bcache/dm-cache, it may still cache small data write

Although I'm not quite sure about dm-cache/bcache, but as long as the 
top file is Btrfs, it won't be possible to limit data/metadata to/from 
specific device.


IIRC, bcache or similiar method may cache most random r/w of metadata, 
it's still quite possible to cache a lot of random r/w of data.


And depending on the sector size(minimal data block size) and leaf size 
(metadata block size), it's even more possible to cache small data other 
than metadata under specific worload.

As default sectorsize is 4K, but leafsize is 16K.

2) Btrfs don't have special preference on chunk allocation.

Btrfs just allocate chunks in the order of unallocated space.
So, even there is a super big TB or PB spinning device, and GB level 
SSD, btrfs will just trust them according to unallocated space.




BTW, to really allocate the bottleneck, it's better to use perf to 
allocate which function btrfs spends most of its time on.


Although it's a known fact that btrfs is quite slow on metadata 
modification compared to other file systems, I'm still not quite sure 
about if that's the root cause.


Thanks,
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: Unmountable fs after power outage

2016-02-08 Thread Qu Wenruo



On 01/22/2016 11:25 PM, Hugo Mills wrote:

On Fri, Jan 22, 2016 at 04:11:53PM +0100, Radek Sprta wrote:

Hello everybody,

after a recent power outage attempting to mount a my btrfs partition
fails with the following error:

mount: wrong fs type, bad option, bad superblock on /dev/sda8,
missing codepage or helper program, or other error

This is what dmesg shows:
[ 1035.236081] BTRFS (device sda8): parent transid verify failed on
410124288 wanted 85753 found 85755
[ 1035.240756] BTRFS (device sda8): parent transid verify failed on
410124288 wanted 85753 found 85755
[ 1035.240780] BTRFS: failed to read tree root on sda8
[ 1035.252025] BTRFS: open_ctree failed


Try mounting with -orecovery. That's the main approach for dealing
with transid failures.


IIRC, current "recovery" will only try to use backup roots.
And that's why I am going to rename the mount option to "usebackuproot" 
in next kernel release.


It's quite strange that current kernel doesn't have mount option to 
ignore transid error any longer.
Just grep "RECOVERY" in btrfs modules sources, and it should be quite 
easy to find this fact.





Below is the result of btrfs check:
Checking filesystem on /dev/sda8
UUID: b0486700-ff9f-4979-8735-257ff1428a0d
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
found 311702978854 bytes used err is 0
total csum bytes: 271057636
total tree bytes: 1859862528
total fs tree bytes: 1473953792
total extent tree bytes: 72876032
btree space waste bytes: 377740163
file data blocks allocated: 11278336921600
  referenced 975937630208
btrfs-progs v4.0


That looks reasonably promising -- nothing seriously damaged that
btrfs check could find, at least.


Quite strange.
IIRC, btrfsck should at least report transid error.
As written in disk-io.c, verify_parent_transid() function.

Since btrfsck reports no error, I think btrfs-image could dump the 
metadata without problem.


So, would you please dump a btrfs-image dump, by the following command?
# btrfs-image -c9 /dev/sda8

Such dump will only contain your metadata(dir/file hierarchy including 
dir/file names), no data will be dump.


And if you think the filename can leak important data, you can use '-ss' 
or '-s' to sanitize them.


Thanks,
Qu




I tried googling the "open_ctree failed" error, but couldn't find any
definite answers. Here's some additional info:


"open_ctree failed" is a really generic error message,
unfortunately. Almost every case that leads to a failure to mount will
output that message, so there's no single solution you can find from
just that error. The "parent transid verify failed" is a much better
indication of what's happened here, and the small difference in
transid numbers would indicate that -orecovery might actually have a
chance of working.

Hugo.


uname -a:
Linux Computer 4.2.0-22-generic #27-Ubuntu SMP Thu Dec 17 22:57:08 UTC
2015 x86_64 x86_64 x86_64 GNU/Linux

btrfs fi show:
Label: none  uuid: b0486700-ff9f-4979-8735-257ff1428a0d
Total devices 1 FS bytes used 290.37GiB
devid1 size 301.32GiB used 295.04GiB path /dev/sda8

Thanks in advance for any help,
Radek



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Use fast device only for metadata?

2016-02-08 Thread Duncan
Nikolaus Rath posted on Mon, 08 Feb 2016 13:44:17 -0800 as excerpted:

> Otherwise I'll give bcache a shot. I've avoided it so far because of the
> need to reformat and because of rumours that it doesn't work well with
> LVM or BTRFS. But it sounds as if that's not the case..

Bcache used to have problems with btrfs, but as I and others have 
mentioned, we have people known to be using btrfs with bcache on the 
list, and it has been working fine for quite some time, now.

Bcache vs. LVM, OTOH, I know nothing about.  Tho to be fair I guess I'm a 
bit anti-LVM biased myself, as it seems a bit too much complexity for the 
offered advantages, and when I tried it some time ago along with mdraid, 
I decided to keep the mdraid, but kill the lvm as too complex to be 
confident I could manage it correctly under the pressures of a disaster 
recovery situation, possibly with limited access to documentation, 
manpages, other recovery tools, etc.  MDRaid, OTOH, was easier to 
administer, in part because it's possible to assemble mdraid direct from 
the kernel without userspace (initr* or the like if / is on it), and I 
successfully managed it thru various issues over some time.

Of course these days I use multi-device btrfs directly, no mdraid, and a 
multi-device btrfs root unfortunately does seem to require an initr*, but 
its other advantages outweigh the additional complexity of having to use 
an initr*, so...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


"layout" of a six drive raid10

2016-02-08 Thread boli
Hi

I'm trying to figure out what a six drive btrfs raid10 would look like. The 
example at 

 seems ambiguous to me.

It could mean that stripes are split over two raid1 sets of three devices each. 
The sentence "Every stripe is split across to exactly 2 RAID-1 sets" would lead 
me to believe this.

However, earlier it says for raid0 that "stripe[s are] split across as many 
devices as possible". Which for six drives would be: stripes are split over 
three raid1 sets of two devices each.

Can anyone enlighten me as to which is correct?


Reason I'm asking is that I'm deciding on a suitable raid level for a new DIY 
NAS box. I'd rather not use btrfs raid6 (for now). The first alternative I 
thought of was raid10. Later I learned how btrfs raid1 works and figured it 
might be better suited for my use case: Striping the data over multiple raid1 
sets doesn't really help, as transfer from/to my box will be limited by gigabit 
ethernet anyway, and a single drive can saturate that.

Thoughts on this would also be appreciated.


As a bonus I was wondering how btrfs raid1 are layed out in general, in 
particular with even and odd numbers of drives. A pair is trivial. For three 
drives I think a "ring setup" with each drive sharing half of its data with 
another drive. But how is it with four drives – are they organized as two 
pairs, or four-way, or …

Cheers, boli--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Use fast device only for metadata?

2016-02-08 Thread Nikolaus Rath
On Feb 07 2016, Martin Steigerwald  wrote:
> Am Sonntag, 7. Februar 2016, 21:07:13 CET schrieb Kai Krakow:
>> Am Sun, 07 Feb 2016 11:06:58 -0800
>> 
>> schrieb Nikolaus Rath :
>> > Hello,
>> > 
>> > I have a large home directory on a spinning disk that I regularly
>> > synchronize between different computers using unison. That takes ages,
>> > even though the amount of changed files is typically small. I suspect
>> > most if the time is spend walking through the file system and checking
>> > mtimes.
>> > 
>> > So I was wondering if I could possibly speed-up this operation by
>> > storing all btrfs metadata on a fast, SSD drive. It seems that
>> > mkfs.btrfs allows me to put the metadata in raid1 or dup mode, and the
>> > file contents in single mode. However, I could not find a way to tell
>> > btrfs to use a device *only* for metadata. Is there a way to do that?
>> > 
>> > Also, what is the difference between using "dup" and "raid1" for the
>> > metadata?
>> 
>> You may want to try bcache. It will speedup random access which is
>> probably the main cause for your slow sync. Unfortunately it requires
>> you to reformat your btrfs partitions to add a bcache superblock. But
>> it's worth the efforts.
>> 
>> I use a nightly rsync to USB3 disk, and bcache reduced it from 5+ hours
>> to typically 1.5-3 depending on how much data changed.
>
> An alternative is using dm-cache, I think it doesn´t need to recreate the 
> filesystem.

Yes, I tried that already but it didn't improve things at all. I wrote a
message to the lvm list though, so maybe someone will be able to help.

Otherwise I'll give bcache a shot. I've avoided it so far because of the
need to reformat and because of rumours that it doesn't work well with
LVM or BTRFS. But it sounds as if that's not the case..


Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "layout" of a six drive raid10

2016-02-08 Thread Hugo Mills
On Mon, Feb 08, 2016 at 11:19:52PM +0100, boli wrote:
> Hi
> 
> I'm trying to figure out what a six drive btrfs raid10 would look like. The 
> example at 
> 
>  seems ambiguous to me.
> 
> It could mean that stripes are split over two raid1 sets of three devices 
> each. The sentence "Every stripe is split across to exactly 2 RAID-1 sets" 
> would lead me to believe this.
> 
> However, earlier it says for raid0 that "stripe[s are] split across as many 
> devices as possible". Which for six drives would be: stripes are split over 
> three raid1 sets of two devices each.
> 
> Can anyone enlighten me as to which is correct?

   Both? :)

   You'll find that on six devices, you'll have six chunks allocated
at the same time: A1, B1, C1, A2, B2, C2. The "2" chunks are
duplicates of the corresponding "1" chunks. The "A", "B", "C" chunks
are the alternate stripes. There's no hierarchy of RAID-0-then-RAID-1
or RAID-1-then-RAID-0.

> Reason I'm asking is that I'm deciding on a suitable raid level for a new DIY 
> NAS box. I'd rather not use btrfs raid6 (for now). The first alternative I 
> thought of was raid10. Later I learned how btrfs raid1 works and figured it 
> might be better suited for my use case: Striping the data over multiple raid1 
> sets doesn't really help, as transfer from/to my box will be limited by 
> gigabit ethernet anyway, and a single drive can saturate that.
> 
> Thoughts on this would also be appreciated.

> As a bonus I was wondering how btrfs raid1 are layed out in general,
> in particular with even and odd numbers of drives. A pair is
> trivial. For three drives I think a "ring setup" with each drive
> sharing half of its data with another drive. But how is it with four
> drives – are they organized as two pairs, or four-way, or …

   The fundamental unit of space allocation at this level is the chunk
-- a 1 GiB unit of storage on one device. (Or 256 MiB for metadata).
Chunks are allocated in block groups to form the RAID behaviour of the
FS.

   So, single mode will allocate one chunk in a block group. RAID-1
and -0 will allocate two chunks in a block group. RAID-10 will
allocate N chunks in a block group, where N is the largest even number
equal to or smaller than the number of devices [with space on]. RAID-5
and -6 will allocate N chunks, where N is the number of devices [with
space on].

   When chunks are to be allocated, they devices are ordered by the
amount of free space on them. The chunks are allocated to devices in
that order.

   So, if you have three equal devices, 1, 2, 3, RAID-1 chunks will be
allocated to them as: 1+2, 3+1, 2+3, repeat.

   With one device larger than the others (say, device 3), it'll start
as: 3+1, 3+2, 3+1, 3+2, repeating until all three devices have equal
free space, and then going back to the pattern above.

   Hugo.

-- 
Hugo Mills | Well, you don't get to be a kernel hacker simply by
hugo@... carfax.org.uk | looking good in Speedos.
http://carfax.org.uk/  |
PGP: E2AB1DE4  | Rusty Russell


signature.asc
Description: Digital signature


[PATCH 02/23] xfstests: filter whitespace in 128 and 132

2016-02-08 Thread Darrick J. Wong
Seems either I have a different lsattr version, or different mount points
cause differences in the golden output.  Send the lsattr output through
the whitespaces filter so that it works everywhere.

The lsattr output /does/ change depending on mountpoints.  Ick.  I'd
actually changed it to the long format output because line length in
the short format changes every time the flags change.

Signed-off-by: Christoph Hellwig 
[darrick.w...@oracle.com: update changelog]
Signed-off-by: Darrick J. Wong 
---
 tests/xfs/128 |2 +-
 tests/xfs/128.out |8 
 tests/xfs/132 |   10 +-
 tests/xfs/132.out |   40 
 4 files changed, 30 insertions(+), 30 deletions(-)


diff --git a/tests/xfs/128 b/tests/xfs/128
index a96291a..c9547fb 100755
--- a/tests/xfs/128
+++ b/tests/xfs/128
@@ -97,7 +97,7 @@ c13=$(_md5_checksum "$testdir/file3")
 c14=$(_md5_checksum "$testdir/file4")
 
 echo "Defragment"
-lsattr -l "$testdir/" | _filter_scratch
+lsattr -l "$testdir/" | _filter_scratch | _filter_spaces
 xfs_fsr -v -d "$testdir/file1" >> "$seqres.full"
 xfs_fsr -v -d "$testdir/file2" >> "$seqres.full" # fsr probably breaks the link
 xfs_fsr -v -d "$testdir/file3" >> "$seqres.full" # fsr probably breaks the link
diff --git a/tests/xfs/128.out b/tests/xfs/128.out
index 7e72dcd..0ac06db 100644
--- a/tests/xfs/128.out
+++ b/tests/xfs/128.out
@@ -11,10 +11,10 @@ c650f1cf6c9f07b22e3e21ec7d49ded5  SCRATCH_MNT/test-128/file2
 56ed2f712c91e035adeeb26ed105a982  SCRATCH_MNT/test-128/file3
 b81534f439aac5c34ce3ed60a03eba70  SCRATCH_MNT/test-128/file4
 Defragment
-SCRATCH_MNT/test-128/file1  ---
-SCRATCH_MNT/test-128/file2  ---
-SCRATCH_MNT/test-128/file3  ---
-SCRATCH_MNT/test-128/file4  ---
+SCRATCH_MNT/test-128/file1 ---
+SCRATCH_MNT/test-128/file2 ---
+SCRATCH_MNT/test-128/file3 ---
+SCRATCH_MNT/test-128/file4 ---
 b81534f439aac5c34ce3ed60a03eba70  SCRATCH_MNT/test-128/file1
 c650f1cf6c9f07b22e3e21ec7d49ded5  SCRATCH_MNT/test-128/file2
 56ed2f712c91e035adeeb26ed105a982  SCRATCH_MNT/test-128/file3
diff --git a/tests/xfs/132 b/tests/xfs/132
index 79a6d57..9c57c3b 100755
--- a/tests/xfs/132
+++ b/tests/xfs/132
@@ -87,32 +87,32 @@ for i in `seq 2 $nr`; do
 done
 _test_remount
 free_blocks1=$(stat -f "$testdir" -c '%f')
-lsattr -l $testdir/ | _filter_test_dir
+lsattr -l $testdir/ | _filter_test_dir | _filter_spaces
 
 echo "funshare part of a file"
 "$XFS_IO_PROG" -f -c "falloc 0 $((sz / 2))" "$testdir/file2"
 _test_remount
-lsattr -l $testdir/ | _filter_test_dir
+lsattr -l $testdir/ | _filter_test_dir | _filter_spaces
 
 echo "funshare some of the copies"
 "$XFS_IO_PROG" -f -c "falloc 0 $sz" "$testdir/file2"
 "$XFS_IO_PROG" -f -c "falloc 0 $sz" "$testdir/file3"
 _test_remount
 free_blocks2=$(stat -f "$testdir" -c '%f')
-lsattr -l $testdir/ | _filter_test_dir
+lsattr -l $testdir/ | _filter_test_dir | _filter_spaces
 
 echo "funshare the rest of the files"
 "$XFS_IO_PROG" -f -c "falloc 0 $sz" "$testdir/file4"
 "$XFS_IO_PROG" -f -c "falloc 0 $sz" "$testdir/file1"
 _test_remount
 free_blocks3=$(stat -f "$testdir" -c '%f')
-lsattr -l $testdir/ | _filter_test_dir
+lsattr -l $testdir/ | _filter_test_dir | _filter_spaces
 
 echo "Rewrite the original file"
 _pwrite_byte 0x65 0 $sz "$testdir/file1" >> "$seqres.full"
 _test_remount
 free_blocks4=$(stat -f "$testdir" -c '%f')
-lsattr -l $testdir/ | _filter_test_dir
+lsattr -l $testdir/ | _filter_test_dir | _filter_spaces
 #echo $free_blocks0 $free_blocks1 $free_blocks2 $free_blocks3 $free_blocks4
 
 _within_tolerance "free blocks after reflinking" $free_blocks1 $((free_blocks0 
- blks)) $margin -v
diff --git a/tests/xfs/132.out b/tests/xfs/132.out
index fd2b7bd..f32db7d 100644
--- a/tests/xfs/132.out
+++ b/tests/xfs/132.out
@@ -1,30 +1,30 @@
 QA output created by 132
 Create the original file blocks
 Create the reflink copies
-TEST_DIR/test-132/file1  ---
-TEST_DIR/test-132/file2  ---
-TEST_DIR/test-132/file3  ---
-TEST_DIR/test-132/file4  ---
+TEST_DIR/test-132/file1 ---
+TEST_DIR/test-132/file2 ---
+TEST_DIR/test-132/file3 ---
+TEST_DIR/test-132/file4 ---
 funshare part of a file
-TEST_DIR/test-132/file1  ---
-TEST_DIR/test-132/file2  ---
-TEST_DIR/test-132/file3  ---
-TEST_DIR/test-132/file4  ---
+TEST_DIR/test-132/file1 ---
+TEST_DIR/test-132/file2 ---
+TEST_DIR/test-132/file3 ---
+TEST_DIR/test-132/file4 ---
 funshare some of the copies
-TEST_DIR/test-132/file1  ---
-TEST_DIR/test-132/file2  No_COW
-TEST_DIR/test-132/file3  No_COW
-TEST_DIR/test-132/file4  ---
+TEST_DIR/test-132/file1 ---
+TEST_DIR/test-132/file2 No_COW
+TEST_DIR/test-132/file3 No_COW
+TEST_DIR/test-132/file4 ---
 funshare the rest of the files
-TEST_DIR/test-132/file1  No_COW
-TEST_DIR/test-132/file2  No_COW
-TEST_DIR/test-132/file3  No_COW
-TEST_DIR/test-132/file4  No_COW

[PATCH 06/23] dio unwritten conversion bug tests

2016-02-08 Thread Darrick J. Wong
Check that we don't expose old disk contents when a directio write to
an unwritten extent fails due to IO errors.  This primarily affects
XFS and ext4.

Signed-off-by: Darrick J. Wong 
---
 .gitignore  |1 
 src/aio-dio-regress/aiocp.c |  489 +++
 tests/generic/250   |  104 +
 tests/generic/250.out   |   10 +
 tests/generic/252   |  107 +
 tests/generic/252.out   |   10 +
 tests/generic/group |2 
 7 files changed, 723 insertions(+)
 create mode 100644 src/aio-dio-regress/aiocp.c
 create mode 100755 tests/generic/250
 create mode 100644 tests/generic/250.out
 create mode 100755 tests/generic/252
 create mode 100644 tests/generic/252.out


diff --git a/.gitignore b/.gitignore
index a6f47d3..bbe7c1a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -112,6 +112,7 @@
 /src/aio-dio-regress/aio-free-ring-with-bogus-nr-pages
 /src/aio-dio-regress/aio-io-setup-with-nonwritable-context-pointer
 /src/aio-dio-regress/aio-last-ref-held-by-io
+/src/aio-dio-regress/aiocp
 /src/aio-dio-regress/aiodio_sparse2
 /src/aio-dio-regress/aio-dio-eof-race
 /src/cloner
diff --git a/src/aio-dio-regress/aiocp.c b/src/aio-dio-regress/aiocp.c
new file mode 100644
index 000..1abff9c
--- /dev/null
+++ b/src/aio-dio-regress/aiocp.c
@@ -0,0 +1,489 @@
+/*
+ * Copyright (c) 2004 Daniel McNeil 
+ *   2004 Open Source Development Lab
+ *   This program is free software;  you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or
+ *   (at your option) any later version.
+ *
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ * Module: .c
+ */
+
+/*
+ * Change History:
+ *
+ *
+ * version of copy command using async i/o
+ * From:   Stephen Hemminger 
+ * Modified by Daniel McNeil  for testing aio.
+ * - added -a alignment
+ * - added -b blksize option 
+ * _ added -s size option
+ * - added -f open_flag option
+ * - added -w (no write) option (reads from source only)
+ * - added -n (num aio) option 
+ * - added -z (zero dest) opton (writes zeros to dest only)
+ * - added -D delay_ms option
+ *  - 2/2004  Marty Ridgeway (mri...@us.ibm.com) Changes to adapt to LTP
+ *
+ * Copy file by using a async I/O state machine.
+ * 1. Start read request
+ * 2. When read completes turn it into a write request
+ * 3. When write completes decrement counter and free resources
+ *
+ *
+ * Usage: aiocp [-b blksize] -n [num_aio] [-w] [-z] [-s filesize] 
+ * [-f DIRECT|TRUNC|CREAT|SYNC|LARGEFILE] src dest
+ */
+
+//#define _GNU_SOURCE
+//#define DEBUG 1
+#undef DEBUG
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#define AIO_BLKSIZE(64*1024)
+#define AIO_MAXIO  32
+
+static int aio_blksize = AIO_BLKSIZE;
+static int aio_maxio = AIO_MAXIO;
+
+static int busy = 0;   // # of I/O's in flight
+static int tocopy = 0; // # of blocks left to copy
+static int srcfd;  // source fd
+static int dstfd = -1; // destination file descriptor
+static const char *dstname = NULL;
+static const char *srcname = NULL;
+static int source_open_flag = O_RDONLY;/* open flags on source file */
+static int dest_open_flag = O_WRONLY;  /* open flags on dest file */
+static int no_write;   /* do not write */
+static int zero;   /* write zero's only */
+
+static int debug;
+static int count_io_q_waits;   /* how many time io_queue_wait called */
+
+struct iocb **iocb_free;   /* array of pointers to iocb */
+int iocb_free_count;   /* current free count */
+int alignment = 512;   /* buffer alignment */
+
+struct timeval delay;  /* delay between i/o */
+
+int init_iocb(int n, int iosize)
+{
+   void *buf;
+   int i;
+
+   if ((iocb_free = malloc(n * sizeof(struct iocb *))) == 0) {
+   return -1;
+   }
+
+   for (i = 0; i < n; i++) {
+   if (!(iocb_free[i] = (struct iocb *) malloc(sizeof(struct 
iocb
+   return -1;
+   if (posix_memalign(, alignment, iosize))
+   return -1;
+   if (debug > 1) {
+   printf("buf allocated at 0x%p, align:%d\n",
+   buf, 

[PATCH v4.1 00/23] xfstests: test the nfs/cifs/btrfs/xfs reflink/dedupe ioctls

2016-02-08 Thread Darrick J. Wong
Happy New Year!

Dave Chinner: I've renumbered the new tests and pushed to github[3] if
you'd like to pull.

This is a (no longer) small patch set against the reflink/dedupe test
cases in xfstests.  The first four patches fix errors in the existing
reflink tests, some of which are from Christoph Hellwig.

Patches 5-6 refactor the dmerror code so that we can use it to
simulate transient IO errors, then use this code to test that
unwritten extent conversion does NOT happen after a directio write to
an unwritten extent hits a disk error.   Due to a bug in the VFS
directio code, ext4 can disclose stale disk contents if an aio dio
write fails; XFS suffers this problem for any failing dio write to an
unwritten extent.  Christoph's kernel patchset titled "vfs/xfs:
directio updates to ease COW handling V2" (and a separate ext4 warning
cleanup) is needed to fix this.

Patches 7-9, 13, 15, 17, 18, 20, 21, and 23 exercise various parts
of the copy on write behavior that are necessary to support shared
blocks.  The earlier patches focus on correct CoW behavior in the
presence of IO errors during the copy-write, and the later patches
focus on XFS' new cow-extent-size hint that greatly reduces
fragmentation due to copy on write behavior by encouraging the
allocator to allocate larger extents of replacement blocks.

Patches 10-12 and 14 perform stress testing on reflink and CoW to
check the behaviors when we get close to maximum refcount, when we
specify obnxiously large offsets and lengths, and when we try to
reflink millions of extents at a time.

Patch 16 tests quota accounting behavior when reflink is enabled.

Patch 19 adds a few tests for the XFS reverse mapping btree to ensure
that things like metadump and growfs work correctly.

Patch 22 checks that get_bmapx and fiemap (on XFS) correctly flag
extents as having shared blocks.  XFS now follows btrfs and ocfs2
FIEMAP behavior such that if any blocks of a file's extent are shared,
the whole extent is marked shared.  This is in contrast to earlier
XFS-only behavior that reported shared and non-shared regions as
separate extents.

If you're going to start using this mess, you probably ought to just
pull from my github trees for kernel[1], xfsprogs[2], xfstests[3],
xfs-docs[4], and man-pages[5].  All tests should pass on XFS.   I
tried btrfs this weekend and it failed 166, 175, 182, 266, 271, 272,
278, 281, 297, 298, 304, 333, and 334.  ocfs2 (when I jury-rigged it
to run the cp_reflink tests) seemed to have a quota bug and crashes
hard in 284 (but was otherwise fine).

Comments and questions are, as always, welcome.

--D

[1] https://github.com/djwong/linux/tree/for-dave
[2] https://github.com/djwong/xfsprogs/tree/for-dave
[3] https://github.com/djwong/xfstests/tree/for-dave
[4] https://github.com/djwong/xfs-documentation/tree/for-dave
[5] https://github.com/djwong/man-pages/commits/for-mtk
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/23] generic/182: this is a dedupe test, check for dedupe

2016-02-08 Thread Darrick J. Wong
Since this test examines dedupe behavior, the documentation should
say 'dedupe', not 'reflink'.  Furthermore, the feature checks must
look for working dedupe functionality, not reflink functionality.

Signed-off-by: Darrick J. Wong 
[h...@lst.de: add the test for dedupe support]
Signed-off-by: Christoph Hellwig 
---
 tests/generic/182 |9 -
 1 file changed, 4 insertions(+), 5 deletions(-)


diff --git a/tests/generic/182 b/tests/generic/182
index bf5cd38..ef10af8 100755
--- a/tests/generic/182
+++ b/tests/generic/182
@@ -1,10 +1,10 @@
 #! /bin/bash
 # FS QA Test No. 182
 #
-# Test the convention that reflink with length == 0 means "to the end of fileA"
+# Test the convention that dedupe with length == 0 means "to the end of fileA"
 #   - Create a file.
-#   - Try to reflink "zero" bytes (which means reflink to EOF).
-#   - Check that the reflink happened.
+#   - Try to dedupe "zero" bytes (which means dedupe to EOF).
+#   - Check that the dedupe happened.
 #
 #---
 # Copyright (c) 2015, Oracle and/or its affiliates.  All Rights Reserved.
@@ -45,8 +45,7 @@ _cleanup()
 
 # real QA test starts here
 _supported_os Linux
-_require_test_reflink
-_require_cp_reflink
+_require_test_dedupe
 
 rm -f "$seqres.full"
 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/23] xfstests: make _scratch_mkfs_blocksized usable

2016-02-08 Thread Darrick J. Wong
The default mkfs.xfs options contain -b size=4096, so all tests
using _scratch_mkfs_blocksized won't actually run unless those
options are changed.  As we're trying to specificly test 1k
blocks we should always override the default option.

v2: Move the function to common/rc

Signed-off-by: Christoph Hellwig 
[darrick.w...@oracle.com: move function to common/rc]
Signed-off-by: Darrick J. Wong 
---
 common/rc  |   24 
 common/reflink |   30 --
 2 files changed, 24 insertions(+), 30 deletions(-)


diff --git a/common/rc b/common/rc
index f08cb3a..863d4b3 100644
--- a/common/rc
+++ b/common/rc
@@ -881,6 +881,30 @@ _scratch_mkfs_geom()
 _scratch_mkfs
 }
 
+# Create fs of certain blocksize on scratch device
+# _scratch_mkfs_blocksized blocksize
+_scratch_mkfs_blocksized()
+{
+blocksize=$1
+
+re='^[0-9]+$'
+if ! [[ $blocksize =~ $re ]] ; then
+_notrun "error: _scratch_mkfs_sized: block size \"$blocksize\" not an 
integer."
+fi
+
+case $FSTYP in
+xfs)
+   _scratch_mkfs_xfs $MKFS_OPTIONS -b size=$blocksize
+   ;;
+ext2|ext3|ext4|ocfs2)
+   ${MKFS_PROG}.$FSTYP -F $MKFS_OPTIONS -b $blocksize $SCRATCH_DEV
+   ;;
+*)
+   _notrun "Filesystem $FSTYP not supported in _scratch_mkfs_blocksized"
+   ;;
+esac
+}
+
 _scratch_resvblks()
 {
case $FSTYP in
diff --git a/common/reflink b/common/reflink
index 8638aba..3d6a8c1 100644
--- a/common/reflink
+++ b/common/reflink
@@ -187,33 +187,3 @@ _dedupe_range() {
 
"$XFS_IO_PROG" $xfs_io_args -f -c "dedupe $file1 $offset1 $offset2 
$len" "$file2"
 }
-
-# Create fs of certain blocksize on scratch device
-# _scratch_mkfs_blocksized blocksize
-_scratch_mkfs_blocksized()
-{
-blocksize=$1
-
-re='^[0-9]+$'
-if ! [[ $blocksize =~ $re ]] ; then
-_notrun "error: _scratch_mkfs_sized: block size \"$blocksize\" not an 
integer."
-fi
-
-case $FSTYP in
-xfs)
-   # don't override MKFS_OPTIONS that set a block size.
-   echo $MKFS_OPTIONS |egrep -q "b?size="
-   if [ $? -eq 0 ]; then
-   _scratch_mkfs_xfs
-   else
-   _scratch_mkfs_xfs -b size=$blocksize
-   fi
-   ;;
-ext2|ext3|ext4|ocfs2)
-   ${MKFS_PROG}.$FSTYP -F $MKFS_OPTIONS -b $blocksize $SCRATCH_DEV
-   ;;
-*)
-   _notrun "Filesystem $FSTYP not supported in _scratch_mkfs_blocksized"
-   ;;
-esac
-}

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/23] reflink: test intersecting CoW and falloc/fpunch/fzero/fcollapse/finsert/ftrunc

2016-02-08 Thread Darrick J. Wong
Ensure that we correctly handle a CoW operation immediately followed
by a truncate, falloc, fpunch, fzero, fcollapse, and finsert operation
in the middle of the CoW'd region before any flush can occur.

Signed-off-by: Darrick J. Wong 
---
 tests/generic/253 |   93 +++
 tests/generic/253.out |   13 +++
 tests/generic/254 |   93 +++
 tests/generic/254.out |   13 +++
 tests/generic/259 |   93 +++
 tests/generic/259.out |   13 +++
 tests/generic/261 |   93 +++
 tests/generic/261.out |   13 +++
 tests/generic/262 |   96 +
 tests/generic/262.out |   13 +++
 tests/generic/264 |   93 +++
 tests/generic/264.out |   13 +++
 tests/generic/group   |6 +++
 13 files changed, 645 insertions(+)
 create mode 100755 tests/generic/253
 create mode 100644 tests/generic/253.out
 create mode 100755 tests/generic/254
 create mode 100644 tests/generic/254.out
 create mode 100755 tests/generic/259
 create mode 100644 tests/generic/259.out
 create mode 100755 tests/generic/261
 create mode 100644 tests/generic/261.out
 create mode 100755 tests/generic/262
 create mode 100644 tests/generic/262.out
 create mode 100755 tests/generic/264
 create mode 100644 tests/generic/264.out


diff --git a/tests/generic/253 b/tests/generic/253
new file mode 100755
index 000..d8e0840
--- /dev/null
+++ b/tests/generic/253
@@ -0,0 +1,93 @@
+#! /bin/bash
+# FS QA Test No. 253
+#
+# Truncate a file at midway through a CoW region.
+#
+# This test is dependent on the system page size, so we cannot use md5 in
+# the golden output; we can only compare to a check file.
+#
+#---
+# Copyright (c) 2016, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename "$0"`
+seqres="$RESULT_DIR/$seq"
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+cd /
+rm -rf "$tmp".*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/reflink
+
+# real QA test starts here
+_supported_os Linux
+_require_scratch_reflink
+_require_cp_reflink
+_require_xfs_io_command "truncate"
+
+rm -f "$seqres.full"
+
+
+echo "Format and mount"
+_scratch_mkfs > "$seqres.full" 2>&1
+_scratch_mount >> "$seqres.full" 2>&1
+
+testdir="$SCRATCH_MNT/test-$seq"
+rm -rf $testdir
+mkdir $testdir
+
+blksz=65536
+nr=4
+
+echo "Create the original files"
+_pwrite_byte 0x61 0 $((blksz * nr)) "$testdir/file1" >> "$seqres.full"
+_cp_reflink "$testdir/file1" "$testdir/file2" >> "$seqres.full"
+_pwrite_byte 0x61 0 $((blksz * nr)) "$testdir/file2.chk" >> "$seqres.full"
+_scratch_remount
+
+echo "Compare files"
+md5sum "$testdir/file1" | _filter_scratch
+md5sum "$testdir/file2" | _filter_scratch
+md5sum "$testdir/file2.chk" | _filter_scratch
+
+echo "CoW and unmount"
+"$XFS_IO_PROG" -f -c "pwrite -S 0x62 -b $((blksz * 2)) $blksz $((blksz * 2))" 
-c "truncate $((blksz * 2))" "$testdir/file2" >> "$seqres.full"
+_scratch_remount
+"$XFS_IO_PROG" -f -c "pwrite -S 0x62 -b $((blksz * 2)) $blksz $((blksz * 2))" 
-c "truncate $((blksz * 2))" "$testdir/file2.chk" >> "$seqres.full"
+_scratch_remount
+
+echo "Compare files"
+md5sum "$testdir/file1" | _filter_scratch
+md5sum "$testdir/file2" | _filter_scratch
+md5sum "$testdir/file2.chk" | _filter_scratch
+
+echo "Check for damage"
+umount "$SCRATCH_MNT"
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/253.out b/tests/generic/253.out
new file mode 100644
index 000..f7c07a0
--- /dev/null
+++ b/tests/generic/253.out
@@ -0,0 +1,13 @@
+QA output created by 253
+Format and mount
+Create the original files
+Compare files
+c946b71bb69c07daf25470742c967e7c  SCRATCH_MNT/test-253/file1
+c946b71bb69c07daf25470742c967e7c  SCRATCH_MNT/test-253/file2
+c946b71bb69c07daf25470742c967e7c  SCRATCH_MNT/test-253/file2.chk
+CoW and unmount
+Compare files
+c946b71bb69c07daf25470742c967e7c  

[PATCH 04/23] reflink: remove redundant filesystem checks from the end of the tests

2016-02-08 Thread Darrick J. Wong
Turns out that check already runs _check_filesystems after each test,
so we don't need to do this at the end of each test.

Signed-off-by: Darrick J. Wong 
---
 tests/generic/157 |1 -
 tests/generic/158 |1 -
 tests/generic/161 |1 -
 tests/generic/162 |1 -
 tests/generic/163 |1 -
 tests/generic/164 |1 -
 tests/generic/165 |1 -
 tests/generic/166 |1 -
 tests/generic/167 |1 -
 tests/generic/168 |1 -
 tests/generic/170 |1 -
 tests/generic/171 |1 -
 tests/generic/172 |1 -
 tests/generic/173 |1 -
 tests/generic/174 |1 -
 tests/generic/175 |1 -
 tests/generic/176 |1 -
 tests/generic/183 |1 -
 tests/generic/185 |1 -
 tests/generic/186 |1 -
 tests/generic/187 |1 -
 tests/generic/188 |1 -
 tests/generic/189 |1 -
 tests/generic/190 |1 -
 tests/generic/191 |1 -
 tests/generic/194 |1 -
 tests/generic/195 |1 -
 tests/generic/196 |1 -
 tests/generic/197 |1 -
 tests/generic/199 |1 -
 tests/generic/200 |1 -
 tests/generic/201 |1 -
 tests/generic/202 |1 -
 tests/generic/203 |1 -
 tests/generic/205 |1 -
 tests/generic/206 |1 -
 tests/generic/216 |1 -
 tests/generic/217 |1 -
 tests/generic/218 |1 -
 tests/generic/220 |1 -
 tests/generic/222 |1 -
 tests/generic/227 |1 -
 tests/generic/229 |1 -
 tests/generic/238 |1 -
 tests/generic/242 |1 -
 tests/generic/243 |1 -
 tests/xfs/127 |1 -
 tests/xfs/128 |1 -
 tests/xfs/131 |1 -
 tests/xfs/139 |1 -
 tests/xfs/140 |1 -
 51 files changed, 51 deletions(-)


diff --git a/tests/generic/157 b/tests/generic/157
index 0150866..74314d8 100755
--- a/tests/generic/157
+++ b/tests/generic/157
@@ -123,7 +123,6 @@ _reflink_range "$testdir2/file1" 0 "$testdir2/file2" 0 
$blksz >> "$seqres.full"
 
 echo "Check scratch fs"
 _scratch_unmount
-_check_scratch_fs
 
 # success, all done
 status=0
diff --git a/tests/generic/158 b/tests/generic/158
index 807c247..779d55e 100755
--- a/tests/generic/158
+++ b/tests/generic/158
@@ -124,7 +124,6 @@ _dedupe_range "$testdir2/file1" 0 "$testdir2/file2" 0 
$blksz >> "$seqres.full"
 
 echo "Check scratch fs"
 _scratch_unmount
-_check_scratch_fs
 
 # success, all done
 status=0
diff --git a/tests/generic/161 b/tests/generic/161
index 7fb8963..b271936 100755
--- a/tests/generic/161
+++ b/tests/generic/161
@@ -71,7 +71,6 @@ wait
 
 echo "Check fs"
 umount "$SCRATCH_MNT"
-_check_scratch_fs
 
 echo "Done"
 # success, all done
diff --git a/tests/generic/162 b/tests/generic/162
index 2fb947a..30c761b 100755
--- a/tests/generic/162
+++ b/tests/generic/162
@@ -87,7 +87,6 @@ wait
 
 echo "Check fs"
 umount "$SCRATCH_MNT"
-_check_scratch_fs
 
 echo "Done"
 # success, all done
diff --git a/tests/generic/163 b/tests/generic/163
index 0186443..f2ea334 100755
--- a/tests/generic/163
+++ b/tests/generic/163
@@ -87,7 +87,6 @@ wait
 
 echo "Check fs"
 umount "$SCRATCH_MNT"
-_check_scratch_fs
 
 echo "Done"
 # success, all done
diff --git a/tests/generic/164 b/tests/generic/164
index 087c6ba..e97ac13 100755
--- a/tests/generic/164
+++ b/tests/generic/164
@@ -97,7 +97,6 @@ wait
 
 echo "Check fs"
 umount "$SCRATCH_MNT"
-_check_scratch_fs
 
 echo "Done"
 # success, all done
diff --git a/tests/generic/165 b/tests/generic/165
index 6bd15e1..b305079 100755
--- a/tests/generic/165
+++ b/tests/generic/165
@@ -97,7 +97,6 @@ wait
 
 echo "Check fs"
 umount "$SCRATCH_MNT"
-_check_scratch_fs
 
 echo "Done"
 # success, all done
diff --git a/tests/generic/166 b/tests/generic/166
index 6cfb821..2c2ff4e 100755
--- a/tests/generic/166
+++ b/tests/generic/166
@@ -84,7 +84,6 @@ wait
 
 echo "Check for damage"
 _scratch_unmount
-_check_scratch_fs
 
 echo "Done"
 
diff --git a/tests/generic/167 b/tests/generic/167
index fc5a86c..b80b481 100755
--- a/tests/generic/167
+++ b/tests/generic/167
@@ -84,7 +84,6 @@ wait
 
 echo "Check for damage"
 _scratch_unmount
-_check_scratch_fs
 
 echo "Done"
 
diff --git a/tests/generic/168 b/tests/generic/168
index ee3848d..0d620da 100755
--- a/tests/generic/168
+++ b/tests/generic/168
@@ -88,7 +88,6 @@ wait
 
 echo "Check for damage"
 umount "$SCRATCH_MNT"
-_check_scratch_fs
 
 echo "Done"
 
diff --git a/tests/generic/170 b/tests/generic/170
index 6d27810..78ed63d 100755
--- a/tests/generic/170
+++ b/tests/generic/170
@@ -88,7 +88,6 @@ wait
 
 echo "Check for damage"
 umount "$SCRATCH_MNT"
-_check_scratch_fs
 
 echo "Done"
 
diff --git a/tests/generic/171 b/tests/generic/171
index ec3729d..4b4f141 100755
--- a/tests/generic/171
+++ b/tests/generic/171
@@ -100,7 +100,6 @@ echo "${out}"
 
 echo "Check scratch fs"
 umount "$SCRATCH_MNT"
-_check_scratch_fs
 
 # success, all done
 status=0
diff --git a/tests/generic/172 b/tests/generic/172
index 1988c8d..98eb97f 100755
--- a/tests/generic/172
+++ b/tests/generic/172
@@ -100,7 +100,6 @@ echo "${out}"
 
 echo "Check scratch fs"
 umount "$SCRATCH_MNT"
-_check_scratch_fs
 
 # 

[PATCH 05/23] common/dmerror: add some more dmerror routines

2016-02-08 Thread Darrick J. Wong
Add functions to the dmerror routine so that we can load both the
error table and the linear table.  This will help us with EIO testing
of copy-on-write.

Signed-off-by: Darrick J. Wong 
---
 common/dmerror  |   27 +--
 tests/btrfs/100 |2 +-
 tests/btrfs/101 |2 +-
 3 files changed, 27 insertions(+), 4 deletions(-)


diff --git a/common/dmerror b/common/dmerror
index 3900a4e..004530d 100644
--- a/common/dmerror
+++ b/common/dmerror
@@ -46,15 +46,23 @@ _dmerror_mount()
_mount -t $FSTYP `_dmerror_mount_options $*`
 }
 
+_dmerror_unmount()
+{
+   umount $SCRATCH_MNT
+}
+
 _dmerror_cleanup()
 {
$UMOUNT_PROG $SCRATCH_MNT > /dev/null 2>&1
$DMSETUP_PROG remove error-test > /dev/null 2>&1
 }
 
-_dmerror_load_table()
+_dmerror_load_error_table()
 {
-   $DMSETUP_PROG suspend error-test
+   suspend_opt="--nolockfs"
+   [ $# -gt 1 ] && [ $2 -eq 1 ] && suspend_opt=""
+
+   $DMSETUP_PROG suspend $suspend_opt error-test
[ $? -ne 0 ] && _fail  "dmsetup suspend failed"
 
$DMSETUP_PROG load error-test --table "$DMERROR_TABLE"
@@ -63,3 +71,18 @@ _dmerror_load_table()
$DMSETUP_PROG resume error-test
[ $? -ne 0 ] && _fail  "dmsetup resume failed"
 }
+
+_dmerror_load_working_table()
+{
+   suspend_opt="--nolockfs"
+   [ $# -gt 1 ] && [ $2 -eq 1 ] && suspend_opt=""
+
+   $DMSETUP_PROG suspend $suspend_opt error-test
+   [ $? -ne 0 ] && _fail  "dmsetup suspend failed"
+
+   $DMSETUP_PROG load error-test --table "$DMLINEAR_TABLE"
+   [ $? -ne 0 ] && _fail "dmsetup failed to load error table"
+
+   $DMSETUP_PROG resume error-test
+   [ $? -ne 0 ] && _fail  "dmsetup resume failed"
+}
diff --git a/tests/btrfs/100 b/tests/btrfs/100
index 080d0ae..cd385e1 100755
--- a/tests/btrfs/100
+++ b/tests/btrfs/100
@@ -69,7 +69,7 @@ run_check $FSSTRESS_PROG -d $SCRATCH_MNT -n 200 -p 8 
$FSSTRESS_AVOID -x \
"$snapshot_cmd" -X 50
 
 # now load the error into the DMERROR_DEV
-_dmerror_load_table
+_dmerror_load_error_table
 
 _run_btrfs_util_prog replace start -B $error_devid $dev2 $SCRATCH_MNT
 
diff --git a/tests/btrfs/101 b/tests/btrfs/101
index 0824de1..8d7af85 100755
--- a/tests/btrfs/101
+++ b/tests/btrfs/101
@@ -70,7 +70,7 @@ run_check $FSSTRESS_PROG -d $SCRATCH_MNT -n 200 -p 8 
$FSSTRESS_AVOID -x \
"$snapshot_cmd" -X 50
 
 # now load the error into the DMERROR_DEV
-_dmerror_load_table
+_dmerror_load_error_table
 
 _run_btrfs_util_prog device delete $error_devid $SCRATCH_MNT
 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/23] xfs: more reflink tests

2016-02-08 Thread Darrick J. Wong
Create a couple of XFS-specific tests -- one to check that growing
and shrinking the refcount btree works and a second one to check
what happens when we hit maximum refcount.

Signed-off-by: Darrick J. Wong 
---
 tests/xfs/169 |   90 
 tests/xfs/169.out |8 
 tests/xfs/179 |  119 +
 tests/xfs/179.out |   10 
 tests/xfs/group   |4 +-
 5 files changed, 230 insertions(+), 1 deletion(-)
 create mode 100755 tests/xfs/169
 create mode 100644 tests/xfs/169.out
 create mode 100755 tests/xfs/179
 create mode 100644 tests/xfs/179.out


diff --git a/tests/xfs/169 b/tests/xfs/169
new file mode 100755
index 000..e0fcc44
--- /dev/null
+++ b/tests/xfs/169
@@ -0,0 +1,90 @@
+#! /bin/bash
+# FS QA Test No. 169
+#
+# Ensure that we can create enough distinct reflink entries to force creation
+# of a multi-level refcount btree.  Delete and recreate a few times to
+# exercise the refcount btree grow/shrink functions.
+#
+#---
+# Copyright (c) 2016, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename "$0"`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+cd /
+umount "$SCRATCH_MNT" > /dev/null 2>&1
+rm -rf "$tmp".*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/reflink
+
+# real QA test starts here
+_supported_os Linux
+_supported_fs xfs
+_require_scratch_reflink
+
+rm -f "$seqres.full"
+
+_scratch_mkfs >/dev/null 2>&1
+_scratch_mount
+
+testdir="$SCRATCH_MNT/test-$seq"
+rm -rf "$testdir"
+mkdir "$testdir"
+
+echo "Create the original file blocks"
+blksz="$(stat -f "$testdir" -c '%S')"
+nr_blks=$((8 * blksz / 12))
+
+for i in 1 2 x; do
+   _pwrite_byte 0x61 0 $((blksz * nr_blks)) "$testdir/file1" >> 
"$seqres.full"
+
+   echo "$i: Reflink every other block"
+   seq 1 2 $((nr_blks - 1)) | while read nr; do
+   _reflink_range  "$testdir/file1" $((nr * blksz)) \
+   "$testdir/file2" $((nr * blksz)) $blksz >> 
"$seqres.full"
+   done
+   umount "$SCRATCH_MNT"
+   _check_scratch_fs
+   _scratch_mount
+
+   test "$i" = "x" && break
+
+   echo "$i: Delete both files"
+   rm -rf "$testdir/file1" "$testdir/file2"
+   umount "$SCRATCH_MNT"
+   _check_scratch_fs
+   _scratch_mount
+done
+
+echo "Check for damage"
+umount "$SCRATCH_MNT"
+
+# success, all done
+status=0
+exit
diff --git a/tests/xfs/169.out b/tests/xfs/169.out
new file mode 100644
index 000..263f696
--- /dev/null
+++ b/tests/xfs/169.out
@@ -0,0 +1,8 @@
+QA output created by 169
+Create the original file blocks
+1: Reflink every other block
+1: Delete both files
+2: Reflink every other block
+2: Delete both files
+x: Reflink every other block
+Check for damage
diff --git a/tests/xfs/179 b/tests/xfs/179
new file mode 100755
index 000..4cdf862
--- /dev/null
+++ b/tests/xfs/179
@@ -0,0 +1,119 @@
+#! /bin/bash
+# FS QA Test No. 179
+#
+# See how well reflink handles overflowing reflink counts.
+#
+#---
+# Copyright (c) 2016, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename "$0"`
+seqres="$RESULT_DIR/$seq"
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$

[PATCH 15/23] reflink: test xfs cow behavior when the filesystem crashes

2016-02-08 Thread Darrick J. Wong
Use the extent size hint to force leftover CoW reservations then
crash the filesystem to see how recovery works.

Signed-off-by: Darrick J. Wong 
---
 tests/xfs/212 |  106 +
 tests/xfs/212.out |   14 +++
 tests/xfs/group   |1 +
 3 files changed, 121 insertions(+)
 create mode 100755 tests/xfs/212
 create mode 100644 tests/xfs/212.out


diff --git a/tests/xfs/212 b/tests/xfs/212
new file mode 100755
index 000..ccddf05
--- /dev/null
+++ b/tests/xfs/212
@@ -0,0 +1,106 @@
+#! /bin/bash
+# FS QA Test No. 212
+#
+# Test recovery of "lost" CoW blocks after a crash:
+# - Create two reflinked files.  Set extsz hint on second file.
+# - Dirty one byte on the second file and fsync.
+# - Crash the FS to test recovery.
+#
+#---
+# Copyright (c) 2016, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename "$0"`
+seqres="$RESULT_DIR/$seq"
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+cd /
+#rm -rf "$tmp".* "$testdir"
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/reflink
+
+# real QA test starts here
+_supported_os Linux
+_supported_fs xfs
+_require_scratch_reflink
+_require_cp_reflink
+_require_fiemap
+
+rm -f "$seqres.full"
+
+echo "Format and mount"
+_scratch_mkfs > "$seqres.full" 2>&1
+_scratch_mount >> "$seqres.full" 2>&1
+
+testdir="$SCRATCH_MNT/test-$seq"
+rm -rf $testdir
+mkdir $testdir
+
+blksz=65536
+nr=16
+bsz=2
+
+free_blocks=$(stat -f -c '%a' "$testdir")
+real_blksz=$(stat -f -c '%S' "$testdir")
+space_needed=$(((blksz * nr * 3) * 5 / 4))
+space_avail=$((free_blocks * real_blksz))
+internal_blks=$((blksz * nr / real_blksz))
+test $space_needed -gt $space_avail && _notrun "Not enough space. $space_avail 
< $space_needed"
+
+echo "Create the original files"
+"$XFS_IO_PROG" -f -c "pwrite -S 0x61 -b $((blksz * bsz)) 0 $((blksz * nr))" 
"$testdir/file1" >> "$seqres.full"
+"$XFS_IO_PROG" -f -c "pwrite -S 0x61 -b $((blksz * bsz)) 0 $((blksz * nr))" 
"$testdir/file2.chk" >> "$seqres.full"
+"$XFS_IO_PROG" -f -c "extsize $((blksz * bsz))" "$testdir/file2"
+_cp_reflink "$testdir/file1" "$testdir/file2" >> "$seqres.full"
+_scratch_remount
+
+echo "Compare files"
+md5sum "$testdir/file1" | _filter_scratch
+md5sum "$testdir/file2" | _filter_scratch
+md5sum "$testdir/file2.chk" | _filter_scratch
+
+echo "CoW and leave leftovers"
+"$XFS_IO_PROG" -f -c "extsize" "$testdir/file2" >> "$seqres.full"
+"$XFS_IO_PROG" -f -c "pwrite -S 0x63 $((blksz * nr - 1)) 1" -c "fsync" 
"$testdir/file2" >> "$seqres.full"
+"$XFS_IO_PROG" -f -c "pwrite -S 0x63 $((blksz * nr - 1)) 1" -c "fsync" 
"$testdir/file2.chk" >> "$seqres.full"
+sync
+
+echo "Crash and recover"
+"$XFS_IO_PROG" -x -c "shutdown" "$testdir/file2" >> "$seqres.full"
+_scratch_remount
+
+echo "Compare files"
+md5sum "$testdir/file1" | _filter_scratch
+md5sum "$testdir/file2" | _filter_scratch
+md5sum "$testdir/file2.chk" | _filter_scratch
+
+echo "Check for damage"
+umount "$SCRATCH_MNT"
+
+# success, all done
+status=0
+exit
diff --git a/tests/xfs/212.out b/tests/xfs/212.out
new file mode 100644
index 000..24b35e2
--- /dev/null
+++ b/tests/xfs/212.out
@@ -0,0 +1,14 @@
+QA output created by 212
+Format and mount
+Create the original files
+Compare files
+7202826a7791073fe2787f0c94603278  SCRATCH_MNT/test-212/file1
+7202826a7791073fe2787f0c94603278  SCRATCH_MNT/test-212/file2
+7202826a7791073fe2787f0c94603278  SCRATCH_MNT/test-212/file2.chk
+CoW and leave leftovers
+Crash and recover
+Compare files
+7202826a7791073fe2787f0c94603278  SCRATCH_MNT/test-212/file1
+83feff041c88d5c746837552399dc27d  SCRATCH_MNT/test-212/file2
+83feff041c88d5c746837552399dc27d  SCRATCH_MNT/test-212/file2.chk
+Check for damage
diff --git a/tests/xfs/group b/tests/xfs/group
index 119e1fd..d4a0d59 100644
--- a/tests/xfs/group
+++ b/tests/xfs/group
@@ -209,6 +209,7 @@
 209 auto quick clone
 210 auto quick clone
 211 clone_stress
+212 auto quick clone
 216 log metadata auto quick
 217 log metadata auto
 220 auto quota quick

--
To unsubscribe from this list: 

[PATCH 13/23] xfs: test fragmentation characteristics of copy-on-write

2016-02-08 Thread Darrick J. Wong
Perform copy-on-writes at random offsets to stress the CoW allocation
system.  Assess the effectiveness of the extent size hint at
combatting fragmentation via unshare, a rewrite, and no-op after the
random writes.

Signed-off-by: Darrick J. Wong 
---
 tests/generic/301 |  105 +
 tests/generic/301.out |   11 
 tests/generic/302 |  105 +
 tests/generic/302.out |   11 
 tests/generic/group   |2 +
 tests/xfs/180 |  111 +++
 tests/xfs/180.out |   12 
 tests/xfs/182 |  111 +++
 tests/xfs/182.out |   13 
 tests/xfs/184 |  110 +++
 tests/xfs/184.out |   11 
 tests/xfs/192 |  110 +++
 tests/xfs/192.out |   11 
 tests/xfs/193 |  107 ++
 tests/xfs/193.out |   11 
 tests/xfs/198 |  107 ++
 tests/xfs/198.out |   11 
 tests/xfs/200 |  114 
 tests/xfs/200.out |   11 
 tests/xfs/204 |  114 
 tests/xfs/204.out |   11 
 tests/xfs/207 |  104 +
 tests/xfs/207.out |   10 +++
 tests/xfs/208 |  154 +
 tests/xfs/208.out |   15 +
 tests/xfs/209 |   88 
 tests/xfs/209.out |6 ++
 tests/xfs/210 |  125 
 tests/xfs/210.out |   14 
 tests/xfs/211 |  111 +++
 tests/xfs/211.out |   12 
 tests/xfs/group   |   13 
 32 files changed, 1861 insertions(+)
 create mode 100755 tests/generic/301
 create mode 100644 tests/generic/301.out
 create mode 100755 tests/generic/302
 create mode 100644 tests/generic/302.out
 create mode 100755 tests/xfs/180
 create mode 100644 tests/xfs/180.out
 create mode 100755 tests/xfs/182
 create mode 100644 tests/xfs/182.out
 create mode 100755 tests/xfs/184
 create mode 100644 tests/xfs/184.out
 create mode 100755 tests/xfs/192
 create mode 100644 tests/xfs/192.out
 create mode 100755 tests/xfs/193
 create mode 100644 tests/xfs/193.out
 create mode 100755 tests/xfs/198
 create mode 100644 tests/xfs/198.out
 create mode 100755 tests/xfs/200
 create mode 100644 tests/xfs/200.out
 create mode 100755 tests/xfs/204
 create mode 100644 tests/xfs/204.out
 create mode 100755 tests/xfs/207
 create mode 100644 tests/xfs/207.out
 create mode 100755 tests/xfs/208
 create mode 100644 tests/xfs/208.out
 create mode 100755 tests/xfs/209
 create mode 100644 tests/xfs/209.out
 create mode 100755 tests/xfs/210
 create mode 100644 tests/xfs/210.out
 create mode 100755 tests/xfs/211
 create mode 100644 tests/xfs/211.out


diff --git a/tests/generic/301 b/tests/generic/301
new file mode 100755
index 000..c4f70e1
--- /dev/null
+++ b/tests/generic/301
@@ -0,0 +1,105 @@
+#! /bin/bash
+# FS QA Test No. 301
+#
+# Test fragmentation after a lot of random CoW:
+# - Create two reflinked files.
+# - Buffered write to random offsets to scatter CoW reservations.
+# - Check the number of extents.
+#
+#---
+# Copyright (c) 2016, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename "$0"`
+seqres="$RESULT_DIR/$seq"
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+cd /
+#rm -rf "$tmp".* "$testdir"
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/reflink
+
+# real QA test starts here
+_supported_os Linux
+_require_scratch_reflink
+_require_cp_reflink
+_require_fiemap
+
+rm -f "$seqres.full"
+
+echo "Format and mount"
+_scratch_mkfs > "$seqres.full" 2>&1
+_scratch_mount >> "$seqres.full" 2>&1
+
+testdir="$SCRATCH_MNT/test-$seq"
+rm -rf $testdir
+mkdir $testdir
+
+blksz=65536
+nr=128
+bsz=16
+
+free_blocks=$(stat -f -c '%a' "$testdir")

[PATCH 16/23] reflink: test quota accounting

2016-02-08 Thread Darrick J. Wong
Signed-off-by: Darrick J. Wong 
---
 common/reflink|2 -
 tests/generic/305 |  105 +++
 tests/generic/305.out |   23 ++
 tests/generic/326 |  105 +++
 tests/generic/326.out |   23 ++
 tests/generic/327 |   92 +
 tests/generic/327.out |   13 ++
 tests/generic/328 |  109 +
 tests/generic/328.out |   26 
 tests/generic/group   |4 ++
 tests/xfs/213 |  110 +
 tests/xfs/213.out |   23 ++
 tests/xfs/214 |  109 +
 tests/xfs/214.out |   23 ++
 tests/xfs/group   |2 +
 15 files changed, 768 insertions(+), 1 deletion(-)
 create mode 100755 tests/generic/305
 create mode 100644 tests/generic/305.out
 create mode 100755 tests/generic/326
 create mode 100644 tests/generic/326.out
 create mode 100755 tests/generic/327
 create mode 100644 tests/generic/327.out
 create mode 100755 tests/generic/328
 create mode 100644 tests/generic/328.out
 create mode 100755 tests/xfs/213
 create mode 100644 tests/xfs/213.out
 create mode 100755 tests/xfs/214
 create mode 100644 tests/xfs/214.out


diff --git a/common/reflink b/common/reflink
index 3d6a8c1..139e00e 100644
--- a/common/reflink
+++ b/common/reflink
@@ -153,7 +153,7 @@ _cp_reflink() {
file1="$1"
file2="$2"
 
-   cp --reflink=always "$file1" "$file2"
+   cp --reflink=always -p "$file1" "$file2"
 }
 
 # Reflink some file1 into file2
diff --git a/tests/generic/305 b/tests/generic/305
new file mode 100755
index 000..5721dd0
--- /dev/null
+++ b/tests/generic/305
@@ -0,0 +1,105 @@
+#! /bin/bash
+# FS QA Test No. 305
+#
+# Ensure that quota charges us for reflinking a file and that we're not
+# charged for buffered copy on write.
+#
+#---
+# Copyright (c) 2016, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename "$0"`
+seqres="$RESULT_DIR/$seq"
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+cd /
+#rm -rf "$tmp".* "$testdir"
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/reflink
+. ./common/quota
+
+# real QA test starts here
+_supported_os Linux
+_require_scratch_reflink
+_require_cp_reflink
+_require_fiemap
+_require_quota
+_need_to_be_root
+_require_nobody
+
+_repquota() {
+   repquota "$SCRATCH_MNT" | egrep '^(fsgqa|root|nobody)'
+}
+rm -f "$seqres.full"
+
+echo "Format and mount"
+_scratch_mkfs > "$seqres.full" 2>&1
+export MOUNT_OPTIONS="-o usrquota,grpquota $MOUNT_OPTIONS"
+_scratch_mount >> "$seqres.full" 2>&1
+quotacheck -u -g "$SCRATCH_MNT" 2> /dev/null
+quotaon "$SCRATCH_MNT" 2> /dev/null
+
+testdir="$SCRATCH_MNT/test-$seq"
+rm -rf $testdir
+mkdir $testdir
+
+sz=1048576
+echo "Create the original files"
+"$XFS_IO_PROG" -f -c "pwrite -S 0x61 -b $sz 0 $sz" "$testdir/file1" >> 
"$seqres.full"
+_cp_reflink "$testdir/file1" "$testdir/file2" >> "$seqres.full"
+_cp_reflink "$testdir/file1" "$testdir/file3" >> "$seqres.full"
+touch "$testdir/urk"
+chown nobody "$testdir/urk"
+touch "$testdir/erk"
+chown fsgqa "$testdir/erk"
+_repquota
+_scratch_remount
+
+echo "Change file ownership"
+chown fsgqa "$testdir/file1"
+chown fsgqa "$testdir/file2"
+chown fsgqa "$testdir/file3"
+_repquota
+
+echo "CoW one of the files"
+"$XFS_IO_PROG" -f -c "pwrite -S 0x63 -b $((sz/2)) 0 $((sz/2))" -c "fsync" 
"$testdir/file2" >> "$seqres.full"
+_repquota
+
+echo "Remount the FS to see if accounting changes"
+_scratch_remount
+_repquota
+
+echo "Chown one of the files"
+chown nobody "$testdir/file3"
+_repquota
+
+echo "Check for damage"
+umount "$SCRATCH_MNT"
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/305.out b/tests/generic/305.out
new file mode 100644
index 000..2acfe04
--- /dev/null
+++ b/tests/generic/305.out
@@ -0,0 +1,23 @@
+QA output created by 305
+Format and mount
+Create the 

[PATCH 09/23] reflink: test CoW operations against the source file

2016-02-08 Thread Darrick J. Wong
Ensure that CoW operations against shared blocks in the source file
work correctly.

v2: remove filefrag dependencies

Signed-off-by: Darrick J. Wong 
---
 tests/generic/196 |2 -
 tests/generic/197 |2 -
 tests/generic/284 |   95 
 tests/generic/284.out |   13 ++
 tests/generic/287 |   95 
 tests/generic/287.out |   13 ++
 tests/generic/289 |  102 +++
 tests/generic/289.out |   13 ++
 tests/generic/290 |  102 +++
 tests/generic/290.out |   13 ++
 tests/generic/291 |  102 +++
 tests/generic/291.out |   13 ++
 tests/generic/292 |  102 +++
 tests/generic/292.out |   13 ++
 tests/generic/293 |  107 +
 tests/generic/293.out |   13 ++
 tests/generic/295 |  107 +
 tests/generic/295.out |   13 ++
 tests/generic/296 |   96 
 tests/generic/296.out |   13 ++
 tests/generic/group   |9 
 21 files changed, 1036 insertions(+), 2 deletions(-)
 create mode 100755 tests/generic/284
 create mode 100644 tests/generic/284.out
 create mode 100755 tests/generic/287
 create mode 100644 tests/generic/287.out
 create mode 100755 tests/generic/289
 create mode 100644 tests/generic/289.out
 create mode 100755 tests/generic/290
 create mode 100644 tests/generic/290.out
 create mode 100755 tests/generic/291
 create mode 100644 tests/generic/291.out
 create mode 100755 tests/generic/292
 create mode 100644 tests/generic/292.out
 create mode 100755 tests/generic/293
 create mode 100644 tests/generic/293.out
 create mode 100755 tests/generic/295
 create mode 100644 tests/generic/295.out
 create mode 100755 tests/generic/296
 create mode 100644 tests/generic/296.out


diff --git a/tests/generic/196 b/tests/generic/196
index 4da9c76..11ecebb 100755
--- a/tests/generic/196
+++ b/tests/generic/196
@@ -2,7 +2,7 @@
 # FS QA Test No. 196
 #
 # Ensuring that copy on write in direct-io mode works when the CoW
-# range originally covers multiple extents, some unwritten, some not.
+# range originally covers multiple extents, some regular, some not.
 #   - Create two files.
 #   - Reflink the odd blocks of the first file into the second file.
 #   - directio CoW across the halfway mark, starting with the unwritten extent.
diff --git a/tests/generic/197 b/tests/generic/197
index 54ee5ab..72c2cb3 100755
--- a/tests/generic/197
+++ b/tests/generic/197
@@ -2,7 +2,7 @@
 # FS QA Test No. 197
 #
 # Ensuring that copy on write in buffered mode works when the CoW
-# range originally covers multiple extents, some unwritten, some not.
+# range originally covers multiple extents, some regular, some not.
 #   - Create two files.
 #   - Reflink the odd blocks of the first file into the second file.
 #   - CoW across the halfway mark, starting with the unwritten extent.
diff --git a/tests/generic/284 b/tests/generic/284
new file mode 100755
index 000..2a94bd1
--- /dev/null
+++ b/tests/generic/284
@@ -0,0 +1,95 @@
+#! /bin/bash
+# FS QA Test No. 284
+#
+# Ensuring that copy on write in buffered mode to the source file when the
+# CoW range covers regular unshared and regular shared blocks.
+#   - Create two files.
+#   - Reflink the odd blocks of the first file into the second file.
+#   - CoW the first file across the halfway mark, starting with the
+# regular extent.
+#   - Check that the files are now different where we say they're different.
+#
+#---
+# Copyright (c) 2016, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename "$0"`
+seqres="$RESULT_DIR/$seq"
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+cd /
+rm -rf "$tmp".*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/reflink
+
+# 

[PATCH 08/23] reflink: test CoW behavior with IO errors

2016-02-08 Thread Darrick J. Wong
Test various scenarios (with dm-flakey) where we simulate write
failures during CoW, to see if the FS can get through it without
blowing up or corrupting data.  Plumb in a FS-generic method to
sort out repairing filesystems after they get hit by IO errors.

Signed-off-by: Darrick J. Wong 
---
 common/rc |   28 +
 tests/generic/265 |  102 +++
 tests/generic/265.out |   11 +
 tests/generic/266 |  103 +++
 tests/generic/266.out |   12 +
 tests/generic/267 |  103 +++
 tests/generic/267.out |   10 +
 tests/generic/268 |  106 +
 tests/generic/268.out |   12 +
 tests/generic/271 |  102 +++
 tests/generic/271.out |   11 +
 tests/generic/272 |  103 +++
 tests/generic/272.out |   12 +
 tests/generic/276 |  103 +++
 tests/generic/276.out |   11 +
 tests/generic/278 |  106 +
 tests/generic/278.out |   12 +
 tests/generic/279 |  103 +++
 tests/generic/279.out |   11 +
 tests/generic/281 |  104 
 tests/generic/281.out |   12 +
 tests/generic/282 |  104 
 tests/generic/282.out |   10 +
 tests/generic/283 |  107 +
 tests/generic/283.out |   12 +
 tests/generic/group   |   12 +
 26 files changed, 1422 insertions(+)
 create mode 100755 tests/generic/265
 create mode 100644 tests/generic/265.out
 create mode 100755 tests/generic/266
 create mode 100644 tests/generic/266.out
 create mode 100755 tests/generic/267
 create mode 100644 tests/generic/267.out
 create mode 100755 tests/generic/268
 create mode 100644 tests/generic/268.out
 create mode 100755 tests/generic/271
 create mode 100644 tests/generic/271.out
 create mode 100755 tests/generic/272
 create mode 100644 tests/generic/272.out
 create mode 100755 tests/generic/276
 create mode 100644 tests/generic/276.out
 create mode 100755 tests/generic/278
 create mode 100644 tests/generic/278.out
 create mode 100755 tests/generic/279
 create mode 100644 tests/generic/279.out
 create mode 100755 tests/generic/281
 create mode 100644 tests/generic/281.out
 create mode 100755 tests/generic/282
 create mode 100644 tests/generic/282.out
 create mode 100755 tests/generic/283
 create mode 100644 tests/generic/283.out


diff --git a/common/rc b/common/rc
index 863d4b3..467c217 100644
--- a/common/rc
+++ b/common/rc
@@ -953,6 +953,34 @@ _scratch_xfs_repair()
 $XFS_REPAIR_PROG $SCRATCH_OPTIONS $* $SCRATCH_DEV
 }
 
+_repair_scratch_fs()
+{
+case $FSTYP in
+xfs)
+_scratch_xfs_repair "$@"
+   res=$?
+   if [ "$res" -eq 2 ]; then
+   echo "xfs_repair returns $res; replay log?"
+   _scratch_mount
+   res=$?
+   if [ "$res" -gt 0 ]; then
+   echo "mount returns $res; zap log?"
+   _scratch_xfs_repair -L
+   echo "log zap returns $?"
+   else
+   umount "$SCRATCH_MNT"
+   fi
+   _scratch_xfs_repair "$@"
+   fi
+   echo "error $?"
+;;
+*)
+# Let's hope fsck -y suffices...
+fsck -t $FSTYP -y $SCRATCH_DEV
+;;
+esac
+}
+
 _get_pids_by_name()
 {
 if [ $# -ne 1 ]
diff --git a/tests/generic/265 b/tests/generic/265
new file mode 100755
index 000..e91b307
--- /dev/null
+++ b/tests/generic/265
@@ -0,0 +1,102 @@
+#! /bin/bash
+# FS QA Test No. 265
+#
+# Test CoW behavior when the write temporarily fails.
+#
+#---
+# Copyright (c) 2016, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename "$0"`
+seqres="$RESULT_DIR/$seq"
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1

[PATCH 18/23] xfs: test the automatic cowextsize extent garbage collector

2016-02-08 Thread Darrick J. Wong
Signed-off-by: Darrick J. Wong 
---
 tests/xfs/231 |  135 
 tests/xfs/231.out |   17 +++
 tests/xfs/232 |  137 +
 tests/xfs/232.out |   17 +++
 tests/xfs/group   |2 +
 5 files changed, 308 insertions(+)
 create mode 100755 tests/xfs/231
 create mode 100644 tests/xfs/231.out
 create mode 100755 tests/xfs/232
 create mode 100644 tests/xfs/232.out


diff --git a/tests/xfs/231 b/tests/xfs/231
new file mode 100755
index 000..d9ae102
--- /dev/null
+++ b/tests/xfs/231
@@ -0,0 +1,135 @@
+#! /bin/bash
+# FS QA Test No. 231
+#
+# Test recovery of unused CoW reservations:
+# - Create two reflinked files.  Set extsz hint on second file.
+# - Dirty a single byte on a number of CoW reservations in the second file.
+# - Fsync to flush out the dirty pages.
+# - Wait for the reclaim to run.
+# - Write more and see how bad fragmentation is.
+#
+#---
+# Copyright (c) 2016, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename "$0"`
+seqres="$RESULT_DIR/$seq"
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+cd /
+echo $old_cow_lifetime > /proc/sys/fs/xfs/speculative_cow_prealloc_lifetime
+#rm -rf "$tmp".* "$testdir"
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/reflink
+
+# real QA test starts here
+_supported_os Linux
+_supported_fs xfs
+_require_scratch_reflink
+_require_cp_reflink
+_require_fiemap
+
+old_cow_lifetime=$(cat /proc/sys/fs/xfs/speculative_cow_prealloc_lifetime)
+
+rm -f "$seqres.full"
+
+echo "Format and mount"
+_scratch_mkfs > "$seqres.full" 2>&1
+_scratch_mount >> "$seqres.full" 2>&1
+
+testdir="$SCRATCH_MNT/test-$seq"
+rm -rf $testdir
+mkdir $testdir
+
+blksz=65536
+nr=64
+bsz=2
+
+free_blocks=$(stat -f -c '%a' "$testdir")
+real_blksz=$(stat -f -c '%S' "$testdir")
+space_needed=$(((blksz * nr * 3) * 5 / 4))
+space_avail=$((free_blocks * real_blksz))
+internal_blks=$((blksz * nr / real_blksz))
+test $space_needed -gt $space_avail && _notrun "Not enough space. $space_avail 
< $space_needed"
+
+echo "Create the original files"
+"$XFS_IO_PROG" -c "cowextsize $((blksz * bsz))" "$testdir"
+"$XFS_IO_PROG" -f -c "pwrite -S 0x61 -b $((blksz * bsz)) 0 $((blksz * nr))" 
"$testdir/file1" >> "$seqres.full"
+"$XFS_IO_PROG" -f -c "pwrite -S 0x61 -b $((blksz * bsz)) 0 $((blksz * nr))" 
"$testdir/file2.chk" >> "$seqres.full"
+_cp_reflink "$testdir/file1" "$testdir/file2" >> "$seqres.full"
+_scratch_remount
+
+echo "Compare files"
+md5sum "$testdir/file1" | _filter_scratch
+md5sum "$testdir/file2" | _filter_scratch
+md5sum "$testdir/file2.chk" | _filter_scratch
+
+echo "CoW and leave leftovers"
+echo 2 > /proc/sys/fs/xfs/speculative_cow_prealloc_lifetime
+seq 2 2 $((nr - 1)) | while read f; do
+   "$XFS_IO_PROG" -f -c "pwrite -S 0x63 $((blksz * f - 1)) 1" 
"$testdir/file2" >> "$seqres.full"
+   "$XFS_IO_PROG" -f -c "pwrite -S 0x63 $((blksz * f - 1)) 1" 
"$testdir/file2.chk" >> "$seqres.full"
+done
+sync
+
+echo "Wait for CoW expiration"
+sleep 3
+
+echo "Allocate free space"
+for i in $(seq 1 32); do
+   "$XFS_IO_PROG" -f -c "falloc 0 1" "$testdir/junk.$i" >> "$seqres.full"
+done
+"$XFS_IO_PROG" -f -c "falloc 0 $((blksz * nr))" "$testdir/junk" >> 
"$seqres.full"
+
+echo "CoW and leave leftovers"
+echo $old_cow_lifetime > /proc/sys/fs/xfs/speculative_cow_prealloc_lifetime
+seq 2 2 $((nr - 1)) | while read f; do
+   "$XFS_IO_PROG" -f -c "pwrite -S 0x63 $((blksz * f)) 1" "$testdir/file2" 
>> "$seqres.full"
+   "$XFS_IO_PROG" -f -c "pwrite -S 0x63 $((blksz * f)) 1" 
"$testdir/file2.chk" >> "$seqres.full"
+done
+sync
+
+echo "Compare files"
+md5sum "$testdir/file1" | _filter_scratch
+md5sum "$testdir/file2" | _filter_scratch
+md5sum "$testdir/file2.chk" | _filter_scratch
+
+echo "Check extent counts"
+old_extents=$(_count_extents "$testdir/file1")
+new_extents=$(_count_extents "$testdir/file2")
+
+echo "old extents: $old_extents" >> "$seqres.full"
+echo "new extents: $new_extents" >> "$seqres.full"

[PATCH 20/23] reflink: test aio copy on write

2016-02-08 Thread Darrick J. Wong
Make sure that copy on write works with the AIO path.

Signed-off-by: Darrick J. Wong 
---
 tests/generic/329 |  105 
 tests/generic/329.out |   12 +
 tests/generic/330 |   96 
 tests/generic/330.out |   11 +
 tests/generic/331 |  107 +
 tests/generic/331.out |   12 +
 tests/generic/332 |   97 
 tests/generic/332.out |   11 +
 tests/generic/group   |4 ++
 9 files changed, 455 insertions(+)
 create mode 100755 tests/generic/329
 create mode 100644 tests/generic/329.out
 create mode 100755 tests/generic/330
 create mode 100644 tests/generic/330.out
 create mode 100755 tests/generic/331
 create mode 100644 tests/generic/331.out
 create mode 100755 tests/generic/332
 create mode 100644 tests/generic/332.out


diff --git a/tests/generic/329 b/tests/generic/329
new file mode 100755
index 000..cf6dab0
--- /dev/null
+++ b/tests/generic/329
@@ -0,0 +1,105 @@
+#! /bin/bash
+# FS QA Test No. 329
+#
+# Test AIO DIO CoW behavior when the write temporarily fails.
+#
+#---
+# Copyright (c) 2016, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename "$0"`
+seqres="$RESULT_DIR/$seq"
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+cd /
+rm -rf "$tmp".* "$TEST_DIR/moo"
+_dmerror_cleanup
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/reflink
+. ./common/dmerror
+
+# real QA test starts here
+_supported_os Linux
+_require_scratch_reflink
+_require_cp_reflink
+_require_dm_target error
+_require_aiodio "aiocp"
+
+rm -f "$seqres.full"
+
+
+echo "Format and mount"
+_scratch_mkfs > "$seqres.full" 2>&1
+_dmerror_init
+_dmerror_mount >> "$seqres.full" 2>&1
+
+testdir="$SCRATCH_MNT/test-$seq"
+rm -rf $testdir
+mkdir $testdir
+
+blksz=65536
+nr=640
+bsz=128
+
+free_blocks=$(stat -f -c '%a' "$testdir")
+real_blksz=$(stat -f -c '%S' "$testdir")
+space_needed=$(((blksz * nr * 3) * 5 / 4))
+space_avail=$((free_blocks * real_blksz))
+test $space_needed -gt $space_avail && _notrun "Not enough space. $space_avail 
< $space_needed"
+
+echo "Create the original files"
+"$XFS_IO_PROG" -f -c "pwrite -S 0x61 -b $((blksz * bsz)) 0 $((blksz * nr))" 
"$testdir/file1" >> "$seqres.full"
+_cp_reflink "$testdir/file1" "$testdir/file2" >> "$seqres.full"
+_dmerror_unmount
+_dmerror_mount
+
+echo "Compare files"
+md5sum "$testdir/file1" | _filter_scratch
+md5sum "$testdir/file2" | _filter_scratch
+
+echo "CoW and unmount"
+"$XFS_IO_PROG" -f -c "pwrite -S 0x63 $((blksz * bsz)) 1" "$testdir/file2" >> 
"$seqres.full"
+"$XFS_IO_PROG" -f -c "pwrite -S 0x63 -b $((blksz * bsz)) 0 $((blksz * nr))" 
"$TEST_DIR/moo" >> "$seqres.full"
+sync
+_dmerror_load_error_table
+"$AIO_TEST" -f DIRECT -b $((blksz * bsz)) "$TEST_DIR/moo" "$testdir/file2" >> 
"$seqres.full"
+_dmerror_load_working_table
+_dmerror_unmount
+_dmerror_mount
+
+echo "Compare files"
+md5sum "$testdir/file1" | _filter_scratch
+md5sum "$testdir/file2" | _filter_scratch
+
+echo "Check for damage"
+_dmerror_unmount
+_dmerror_cleanup
+_repair_scratch_fs >> "$seqres.full" 2>&1
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/329.out b/tests/generic/329.out
new file mode 100644
index 000..f14726c
--- /dev/null
+++ b/tests/generic/329.out
@@ -0,0 +1,12 @@
+QA output created by 329
+Format and mount
+Create the original files
+Compare files
+1886e67cf8783e89ce6ddc5bb09a3944  SCRATCH_MNT/test-329/file1
+1886e67cf8783e89ce6ddc5bb09a3944  SCRATCH_MNT/test-329/file2
+CoW and unmount
+write missed bytes expect 8388608 got 0
+Compare files
+1886e67cf8783e89ce6ddc5bb09a3944  SCRATCH_MNT/test-329/file1
+d94b0ab13385aba594411c174b1cc13c  SCRATCH_MNT/test-329/file2
+Check for damage
diff --git a/tests/generic/330 b/tests/generic/330
new file mode 100755
index 000..d720f58
--- /dev/null
+++ b/tests/generic/330
@@ -0,0 +1,96 @@
+#! /bin/bash
+# FS QA Test No. 330
+#
+# Test AIO DIO CoW behavior.
+#

[PATCH 23/23] reflink: test reflink+cow+enospc all at the same time

2016-02-08 Thread Darrick J. Wong
Set up an impossibly small filesystem and try to reflink and rewrite a
file on it to see what happens when we ENOSPC.  Basically
generic/16[67] but with a constrained fs size.

Signed-off-by: Darrick J. Wong 
---
 tests/generic/166 |6 ++-
 tests/generic/167 |6 ++-
 tests/generic/333 |  104 +
 tests/generic/333.out |6 +++
 tests/generic/334 |  104 +
 tests/generic/334.out |6 +++
 tests/generic/group   |2 +
 7 files changed, 232 insertions(+), 2 deletions(-)
 create mode 100755 tests/generic/333
 create mode 100644 tests/generic/333.out
 create mode 100755 tests/generic/334
 create mode 100644 tests/generic/334.out


diff --git a/tests/generic/166 b/tests/generic/166
index 2c2ff4e..30d76a0 100755
--- a/tests/generic/166
+++ b/tests/generic/166
@@ -69,7 +69,11 @@ _scratch_remount
 snappy() {
n=0
while [ ! -e "$testdir/finished" ]; do
-   _cp_reflink "$testdir/file1" "$testdir/snap_$n" || break
+   out="$(_cp_reflink "$testdir/file1" "$testdir/snap_$n" 2>&1)"
+   res=$?
+   echo "$out" | grep -q "No space left" && break
+   test -n "$out" && echo "$out"
+   test $res -ne 0 && break
n=$((n + 1))
done
 }
diff --git a/tests/generic/167 b/tests/generic/167
index b80b481..5198a81 100755
--- a/tests/generic/167
+++ b/tests/generic/167
@@ -69,7 +69,11 @@ _scratch_remount
 snappy() {
n=0
while [ ! -e "$testdir/finished" ]; do
-   _cp_reflink "$testdir/file1" "$testdir/snap_$n" || break
+   out="$(_cp_reflink "$testdir/file1" "$testdir/snap_$n" 2>&1)"
+   res=$?
+   echo "$out" | grep -q "No space left" && break
+   test -n "$out" && echo "$out"
+   test $res -ne 0 && break
n=$((n + 1))
done
 }
diff --git a/tests/generic/333 b/tests/generic/333
new file mode 100755
index 000..4ca7803
--- /dev/null
+++ b/tests/generic/333
@@ -0,0 +1,104 @@
+#! /bin/bash
+# FS QA Test No. 333
+#
+# Test for races or FS corruption when trying to hit ENOSPC while DIO writing
+# to a file that's also the source of a reflink operation.
+#
+#---
+# Copyright (c) 2016 Oracle, Inc.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename "$0"`
+seqres="$RESULT_DIR/$seq"
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 7 15
+
+_cleanup()
+{
+cd /
+rm -rf "$tmp".*
+wait
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/reflink
+
+# real QA test starts here
+_supported_os Linux
+_require_scratch_reflink
+_require_cp_reflink
+
+echo "Format and mount"
+_scratch_mkfs_sized $((400 * 1048576)) > "$seqres.full" 2>&1
+_scratch_mount >> "$seqres.full" 2>&1
+
+testdir="$SCRATCH_MNT/test-$seq"
+rm -rf "$testdir"
+mkdir "$testdir"
+
+loops=1024
+nr_loops=$((loops - 1))
+blksz=65536
+
+echo "Initialize file"
+echo > "$seqres.full"
+_pwrite_byte 0x61 0 $((loops * blksz)) "$testdir/file1" >> "$seqres.full"
+_scratch_remount
+
+# Snapshot creator...
+snappy() {
+   n=0
+   while [ ! -e "$testdir/finished" ]; do
+   out="$(_cp_reflink "$testdir/file1" "$testdir/snap_$n" 2>&1)"
+   res=$?
+   echo "$out" | grep -q "No space left" && break
+   test -n "$out" && echo "$out"
+   test $res -ne 0 && break
+   n=$((n + 1))
+   done
+   touch "$testdir/abort"
+}
+
+echo "Snapshot a file undergoing directio rewrite"
+snappy &
+seq 1 1000 | while read i; do
+   seq $nr_loops -1 0 | while read i; do
+   out="$(_pwrite_byte 0x63 $((i * blksz)) $blksz -d 
"$testdir/file1" 2>&1)"
+   echo "$out" >> "$seqres.full"
+   echo "$out" | grep -q "No space left" && touch "$testdir/abort"
+   echo "$out" | grep -qi "error" && touch "$testdir/abort"
+   test -e "$testdir/abort" && break
+   done
+   test -e "$testdir/abort" && break
+done
+touch 

[PATCH 17/23] reflink: test CoW across a mixed range of block types with cowextsize set

2016-02-08 Thread Darrick J. Wong
Signed-off-by: Darrick J. Wong 
---
 tests/xfs/215 |  108 ++
 tests/xfs/215.out |   14 +
 tests/xfs/218 |  108 ++
 tests/xfs/218.out |   14 +
 tests/xfs/219 |  108 ++
 tests/xfs/219.out |   14 +
 tests/xfs/221 |  108 ++
 tests/xfs/221.out |   14 +
 tests/xfs/223 |  113 
 tests/xfs/223.out |   14 +
 tests/xfs/224 |  113 
 tests/xfs/224.out |   14 +
 tests/xfs/225 |  108 ++
 tests/xfs/225.out |   14 +
 tests/xfs/226 |  108 ++
 tests/xfs/226.out |   14 +
 tests/xfs/228 |  137 +
 tests/xfs/228.out |   14 +
 tests/xfs/230 |  137 +
 tests/xfs/230.out |   14 +
 tests/xfs/group   |   10 
 21 files changed, 1298 insertions(+)
 create mode 100755 tests/xfs/215
 create mode 100644 tests/xfs/215.out
 create mode 100755 tests/xfs/218
 create mode 100644 tests/xfs/218.out
 create mode 100755 tests/xfs/219
 create mode 100644 tests/xfs/219.out
 create mode 100755 tests/xfs/221
 create mode 100644 tests/xfs/221.out
 create mode 100755 tests/xfs/223
 create mode 100644 tests/xfs/223.out
 create mode 100755 tests/xfs/224
 create mode 100644 tests/xfs/224.out
 create mode 100755 tests/xfs/225
 create mode 100644 tests/xfs/225.out
 create mode 100755 tests/xfs/226
 create mode 100644 tests/xfs/226.out
 create mode 100755 tests/xfs/228
 create mode 100644 tests/xfs/228.out
 create mode 100755 tests/xfs/230
 create mode 100644 tests/xfs/230.out


diff --git a/tests/xfs/215 b/tests/xfs/215
new file mode 100755
index 000..8dd5cb5
--- /dev/null
+++ b/tests/xfs/215
@@ -0,0 +1,108 @@
+#! /bin/bash
+# FS QA Test No. 215
+#
+# Ensuring that copy on write in direct-io mode works when the CoW
+# range originally covers multiple extents, some unwritten, some not.
+#   - Set cowextsize hint.
+#   - Create a file and fallocate a second file.
+#   - Reflink the odd blocks of the first file into the second file.
+#   - directio CoW across the halfway mark, starting with the unwritten extent.
+#   - Check that the files are now different where we say they're different.
+#
+#---
+# Copyright (c) 2016, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename "$0"`
+seqres="$RESULT_DIR/$seq"
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+cd /
+rm -rf "$tmp".*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/reflink
+
+# real QA test starts here
+_supported_os Linux
+_require_scratch_reflink
+_require_xfs_io_command "falloc"
+
+rm -f "$seqres.full"
+
+echo "Format and mount"
+_scratch_mkfs > "$seqres.full" 2>&1
+_scratch_mount >> "$seqres.full" 2>&1
+
+testdir="$SCRATCH_MNT/test-$seq"
+rm -rf $testdir
+mkdir $testdir
+
+echo "Create the original files"
+blksz=65536
+nr=64
+real_blksz=$(stat -f -c '%S' "$testdir")
+internal_blks=$((blksz * nr / real_blksz))
+"$XFS_IO_PROG" -c "cowextsize $((blksz * 16))" "$testdir" >> "$seqres.full"
+_pwrite_byte 0x61 0 $((blksz * nr)) "$testdir/file1" >> "$seqres.full"
+$XFS_IO_PROG -f -c "falloc 0 $((blksz * nr))" "$testdir/file3" >> 
"$seqres.full"
+_pwrite_byte 0x00 0 $((blksz * nr)) "$testdir/file3.chk" >> "$seqres.full"
+seq 0 2 $((nr-1)) | while read f; do
+   _reflink_range "$testdir/file1" $((blksz * f)) "$testdir/file3" 
$((blksz * f)) $blksz >> "$seqres.full"
+   _pwrite_byte 0x61 $((blksz * f)) $blksz "$testdir/file3.chk" >> 
"$seqres.full"
+done
+_scratch_remount
+
+echo "Compare files"
+md5sum "$testdir/file1" | _filter_scratch
+md5sum "$testdir/file3" | _filter_scratch
+md5sum "$testdir/file3.chk" | _filter_scratch
+
+echo "directio CoW across the transition"
+"$XFS_IO_PROG" -d -f -c "pwrite -S 0x63 -b 

[PATCH 14/23] reflink: high offset reflink and dedupe tests

2016-02-08 Thread Darrick J. Wong
Ensure that we can pass absurdly enormous offsets and lengths to
reflink/dedupe and it'll survive.

v2: Ask for dedupe in the dedupe test.

Signed-off-by: Darrick J. Wong 
[h...@lst.de: call _require_test_dedupe]
Signed-off-by: Christoph Hellwig 
---
 tests/generic/303 |   99 +
 tests/generic/303.out |   21 ++
 tests/generic/304 |  100 +
 tests/generic/304.out |   22 +++
 tests/generic/group   |2 +
 5 files changed, 244 insertions(+)
 create mode 100755 tests/generic/303
 create mode 100644 tests/generic/303.out
 create mode 100755 tests/generic/304
 create mode 100644 tests/generic/304.out


diff --git a/tests/generic/303 b/tests/generic/303
new file mode 100755
index 000..c337483
--- /dev/null
+++ b/tests/generic/303
@@ -0,0 +1,99 @@
+#! /bin/bash
+# FS QA Test No. 303
+#
+# Check that high-offset reflinks work.
+#
+#---
+# Copyright (c) 2016, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename "$0"`
+seqres="$RESULT_DIR/$seq"
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+cd /
+rm -rf "$tmp".* "$testdir"
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/attr
+. ./common/reflink
+
+# real QA test starts here
+_supported_os Linux
+_require_test_reflink
+_require_cp_reflink
+
+rm -f "$seqres.full"
+
+echo "Format and mount"
+testdir="$TEST_DIR/test-$seq"
+rm -rf "$testdir"
+mkdir "$testdir"
+
+echo "Create the original files"
+bigoff=9223372036854775806
+len=9223372036854775807
+bigoff_64k=9223372036854710272 # bigoff rounded down to 64k
+"$XFS_IO_PROG" -f -c "truncate $len" "$testdir/file0" >> "$seqres.full"
+test -s "$testdir/file0" || _notrun "High offset ftruncate failed"
+_pwrite_byte 0x61 $bigoff 1 "$testdir/file1" >> "$seqres.full"
+_pwrite_byte 0x61 1048575 1 "$testdir/file2" >> "$seqres.full"
+
+echo "Reflink large single byte file"
+_cp_reflink "$testdir/file1" "$testdir/file3" >> "$seqres.full"
+
+echo "Reflink large empty file"
+_cp_reflink "$testdir/file0" "$testdir/file4" >> "$seqres.full"
+
+echo "Reflink past maximum file size in dest file (should fail)"
+_reflink_range "$testdir/file1" 0 "$testdir/file5" 4611686018427322368 $len >> 
"$seqres.full"
+
+echo "Reflink high offset to low offset"
+_reflink_range "$testdir/file1" $bigoff_64k "$testdir/file6" 1048576 65535 >> 
"$seqres.full"
+
+echo "Reflink past source file EOF (should fail)"
+_reflink_range "$testdir/file2" 524288 "$testdir/file7" 0 1048576 >> 
"$seqres.full"
+
+echo "Reflink max size at nonzero offset (should fail)"
+_reflink_range "$testdir/file2" 524288 "$testdir/file8" 0 $len >> 
"$seqres.full"
+
+echo "Reflink with huge off/len (should fail)"
+_reflink_range "$testdir/file2" $bigoff_64k "$testdir/file9" 0 $bigoff_64k >> 
"$seqres.full"
+
+echo "Check file creation"
+_test_remount
+echo "file3"
+"$XFS_IO_PROG" -c "pread -v -q $bigoff 1" "$testdir/file3"
+echo "file4"
+"$XFS_IO_PROG" -c "pread -v -q $bigoff 1" "$testdir/file4"
+# file5 should fail above
+echo "file6"
+"$XFS_IO_PROG" -c "pread -v -q 1114110 1" "$testdir/file6"
+# file7 should fail above
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/303.out b/tests/generic/303.out
new file mode 100644
index 000..39a8803
--- /dev/null
+++ b/tests/generic/303.out
@@ -0,0 +1,21 @@
+QA output created by 303
+Format and mount
+Create the original files
+Reflink large single byte file
+Reflink large empty file
+Reflink past maximum file size in dest file (should fail)
+XFS_IOC_CLONE_RANGE: Invalid argument
+Reflink high offset to low offset
+Reflink past source file EOF (should fail)
+XFS_IOC_CLONE_RANGE: Invalid argument
+Reflink max size at nonzero offset (should fail)
+XFS_IOC_CLONE_RANGE: Invalid argument
+Reflink with huge off/len (should fail)
+XFS_IOC_CLONE_RANGE: Invalid argument
+Check file creation
+file3
+7ffe:  61  a
+file4
+7ffe:  00  .
+file6
+0010fffe:  61  a
diff --git a/tests/generic/304 

[PATCH 12/23] xfs/122: support refcount/rmap data structures

2016-02-08 Thread Darrick J. Wong
Include the refcount and rmap structures in the golden output.

Signed-off-by: Darrick J. Wong 
---
 tests/xfs/122 |3 +++
 tests/xfs/122.out |4 
 tests/xfs/group   |2 +-
 3 files changed, 8 insertions(+), 1 deletion(-)


diff --git a/tests/xfs/122 b/tests/xfs/122
index e6697a2..758cb50 100755
--- a/tests/xfs/122
+++ b/tests/xfs/122
@@ -90,6 +90,9 @@ xfs_da3_icnode_hdr
 xfs_dir3_icfree_hdr
 xfs_dir3_icleaf_hdr
 xfs_name
+xfs_owner_info
+xfs_refcount_irec
+xfs_rmap_irec
 xfs_alloctype_t
 xfs_buf_cancel_t
 xfs_bmbt_rec_32_t
diff --git a/tests/xfs/122.out b/tests/xfs/122.out
index 8ba121e..c590166 100644
--- a/tests/xfs/122.out
+++ b/tests/xfs/122.out
@@ -75,6 +75,10 @@ sizeof(struct xfs_extent_data) = 24
 sizeof(struct xfs_extent_data_info) = 32
 sizeof(struct xfs_fs_eofblocks) = 128
 sizeof(struct xfs_icreate_log) = 28
+sizeof(struct xfs_refcount_key) = 4
+sizeof(struct xfs_refcount_rec) = 12
+sizeof(struct xfs_rmap_key) = 20
+sizeof(struct xfs_rmap_rec) = 24
 sizeof(xfs_agf_t) = 224
 sizeof(xfs_agfl_t) = 36
 sizeof(xfs_agi_t) = 336
diff --git a/tests/xfs/group b/tests/xfs/group
index f0c1c2b..abf1d33 100644
--- a/tests/xfs/group
+++ b/tests/xfs/group
@@ -119,7 +119,7 @@
 119 log v2log auto freeze dangerous
 120 fuzzers
 121 log auto quick
-122 other auto quick
+122 other auto quick clone
 123 fuzzers
 124 fuzzers
 125 fuzzers

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/23] reflink: ensure that we can handle reflinking a lot of extents

2016-02-08 Thread Darrick J. Wong
Update the existing stress tests to ensure that we can handle
reflinking the same block a million times, and that we can handle
reflinking million different extents.  Add a couple of tests to ensure
that we can ^C and SIGKILL our way out of long-running reflinks.

v2: Don't run the signal tests on NFS, as we cannot interrupt NFS
clone operations.

Signed-off-by: Darrick J. Wong 
[h...@lst.de: don't run on NFS]
Signed-off-by: Christoph Hellwig 
---
 .gitignore  |1 
 src/Makefile|2 -
 src/punch-alternating.c |   59 +++
 tests/generic/175   |   42 +++-
 tests/generic/175.out   |6 +++
 tests/generic/176   |   50 +++
 tests/generic/176.out   |4 +-
 tests/generic/297   |  101 +++
 tests/generic/297.out   |6 +++
 tests/generic/298   |  101 +++
 tests/generic/298.out   |6 +++
 tests/generic/group |6 ++-
 12 files changed, 334 insertions(+), 50 deletions(-)
 create mode 100644 src/punch-alternating.c
 create mode 100755 tests/generic/297
 create mode 100644 tests/generic/297.out
 create mode 100755 tests/generic/298
 create mode 100644 tests/generic/298.out


diff --git a/.gitignore b/.gitignore
index bbe7c1a..c98c7bf 100644
--- a/.gitignore
+++ b/.gitignore
@@ -115,6 +115,7 @@
 /src/aio-dio-regress/aiocp
 /src/aio-dio-regress/aiodio_sparse2
 /src/aio-dio-regress/aio-dio-eof-race
+/src/punch-alternating
 /src/cloner
 /src/renameat2
 /src/t_rename_overwrite
diff --git a/src/Makefile b/src/Makefile
index 48e6765..3110208 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -19,7 +19,7 @@ LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize 
preallo_rw_pattern_reader \
bulkstat_unlink_test_modified t_dir_offset t_futimens t_immutable \
stale_handle pwrite_mmap_blocked t_dir_offset2 seek_sanity_test \
seek_copy_test t_readdir_1 t_readdir_2 fsync-tester nsexec cloner \
-   renameat2 t_getcwd e4compact test-nextquota
+   renameat2 t_getcwd e4compact test-nextquota punch-alternating
 
 SUBDIRS =
 
diff --git a/src/punch-alternating.c b/src/punch-alternating.c
new file mode 100644
index 000..9566310
--- /dev/null
+++ b/src/punch-alternating.c
@@ -0,0 +1,59 @@
+/*
+ * Punch out every other block in a file.
+ * Copyright (C) 2016 Oracle.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "global.h"
+
+int main(int argc, char *argv[])
+{
+   struct stat s;
+   off_t   offset;
+   int fd;
+   blksize_t   blksz;
+   off_t   sz;
+   int mode;
+   int error;
+
+   if (argc != 2) {
+   printf("Usage: %s file\n", argv[0]);
+   printf("Punches every other block in the file.\n");
+   return 1;
+   }
+
+   fd = open(argv[1], O_WRONLY);
+   if (fd < 0)
+   goto err;
+
+   error = fstat(fd, );
+   if (error)
+   goto err;
+
+   sz = s.st_size;
+   blksz = s.st_blksize;
+
+   mode = FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE;
+   for (offset = 0; offset < sz; offset += blksz * 2) {
+   error = fallocate(fd, mode, offset, blksz);
+   if (error)
+   goto err;
+   }
+
+   error = fsync(fd);
+   if (error)
+   goto err;
+
+   error = close(fd);
+   if (error)
+   goto err;
+   return 0;
+err:
+   perror(argv[1]);
+   return 2;
+}
diff --git a/tests/generic/175 b/tests/generic/175
index ac2f54f..0a6d5b8 100755
--- a/tests/generic/175
+++ b/tests/generic/175
@@ -1,12 +1,10 @@
 #! /bin/bash
 # FS QA Test No. 175
 #
-# Try to hit the maximum reference count (eek!)
-#
-# This test runs extremely slowly, so it's not automatically run anywhere.
+# See how well reflink handles reflinking the same block a million times.
 #
 #---
-# Copyright (c) 2015, Oracle and/or its affiliates.  All Rights Reserved.
+# Copyright (c) 2016, Oracle and/or its affiliates.  All Rights Reserved.
 #
 # This program is free software; you can redistribute it and/or
 # modify it under the terms of the GNU General Public License as
@@ -34,7 +32,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _cleanup()
 {
 cd /
-rm -rf "$tmp".* "$testdir1"
+rm -rf "$tmp".*
 }
 
 # get standard environment, filters and checks
@@ -58,40 +56,28 @@ testdir="$SCRATCH_MNT/test-$seq"
 rm -rf "$testdir"
 mkdir "$testdir"
 
-# Well let's hope the maximum reflink count is (less than (ha!)) 2^32...
-
 echo "Create a one block file"
 blksz="$(stat -f "$testdir" -c '%S')"
 _pwrite_byte 0x61 0 $blksz "$testdir/file1" >> "$seqres.full"
-_pwrite_byte 0x62 0 $blksz "$testdir/file2" >> "$seqres.full"
-_cp_reflink 

[PATCH 19/23] xfs: test rmapbt functionality

2016-02-08 Thread Darrick J. Wong
Signed-off-by: Darrick J. Wong 
---
 common/xfs|   44 ++
 tests/xfs/233 |   78 ++
 tests/xfs/233.out |6 +++
 tests/xfs/234 |   89 
 tests/xfs/234.out |6 +++
 tests/xfs/235 |  108 +
 tests/xfs/235.out |   14 +++
 tests/xfs/236 |   93 ++
 tests/xfs/236.out |8 
 tests/xfs/group   |4 ++
 10 files changed, 450 insertions(+)
 create mode 100644 common/xfs
 create mode 100755 tests/xfs/233
 create mode 100644 tests/xfs/233.out
 create mode 100755 tests/xfs/234
 create mode 100644 tests/xfs/234.out
 create mode 100755 tests/xfs/235
 create mode 100644 tests/xfs/235.out
 create mode 100755 tests/xfs/236
 create mode 100644 tests/xfs/236.out


diff --git a/common/xfs b/common/xfs
new file mode 100644
index 000..2d1a76f
--- /dev/null
+++ b/common/xfs
@@ -0,0 +1,44 @@
+##/bin/bash
+# Routines for handling XFS
+#---
+#  Copyright (c) 2015 Oracle.  All Rights Reserved.
+#  This program is free software; you can redistribute it and/or modify
+#  it under the terms of the GNU General Public License as published by
+#  the Free Software Foundation; either version 2 of the License, or
+#  (at your option) any later version.
+#
+#  This program is distributed in the hope that it will be useful,
+#  but WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+#  GNU General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with this program; if not, write to the Free Software
+#  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307
+#  USA
+#
+#  Contact information: Oracle Corporation, 500 Oracle Parkway,
+#  Redwood Shores, CA 94065, USA, or: http://www.oracle.com/
+#---
+
+_require_xfs_test_rmapbt()
+{
+   _require_test
+
+   if [ "$(xfs_info "$TEST_DIR" | grep -c "rmapbt=1")" -ne 1 ]; then
+   _notrun "rmapbt not supported by test filesystem type: $FSTYP"
+   fi
+}
+
+_require_xfs_scratch_rmapbt()
+{
+   _require_scratch
+
+   _scratch_mkfs > /dev/null
+   _scratch_mount
+   if [ "$(xfs_info "$SCRATCH_MNT" | grep -c "rmapbt=1")" -ne 1 ]; then
+   _scratch_unmount
+   _notrun "rmapbt not supported by scratch filesystem type: 
$FSTYP"
+   fi
+   _scratch_unmount
+}
diff --git a/tests/xfs/233 b/tests/xfs/233
new file mode 100755
index 000..2e61275
--- /dev/null
+++ b/tests/xfs/233
@@ -0,0 +1,78 @@
+#! /bin/bash
+# FS QA Test No. 233
+#
+# Tests xfs_growfs on a rmapbt filesystem
+#
+#---
+# Copyright (c) 2016, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename "$0"`
+seqres="$RESULT_DIR/$seq"
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+cd /
+rm -f "$tmp".*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/xfs
+
+# real QA test starts here
+_supported_os Linux
+_supported_fs xfs
+_require_xfs_scratch_rmapbt
+
+echo "Format and mount"
+_scratch_mkfs -d size=$((2 * 4096 * 4096)) -l size=4194304 > "$seqres.full" 
2>&1
+_scratch_mount >> "$seqres.full" 2>&1
+
+testdir="$SCRATCH_MNT/test-$seq"
+rm -rf "$testdir"
+mkdir "$testdir"
+
+echo "Create the original files"
+blksz="$(stat -f "$testdir" -c '%S')"
+_pwrite_byte 0x61 0 $((blksz * 14 + 71)) "$testdir/original" >> "$seqres.full"
+cp -p "$testdir/original" "$testdir/copy1"
+cp -p "$testdir/copy1" "$testdir/copy2"
+
+echo "Grow fs"
+"$XFS_GROWFS_PROG" "$SCRATCH_MNT" 2>&1 |  _filter_growfs >> "$seqres.full"
+_scratch_remount
+
+echo "Create more copies"
+cp -p "$testdir/original" "$testdir/copy3"
+
+xfs_info "$SCRATCH_MNT" >> "$seqres.full"
+
+echo "Check scratch fs"
+umount "$SCRATCH_MNT"

[PATCH 21/23] xfs: aio cow tests

2016-02-08 Thread Darrick J. Wong
Signed-off-by: Darrick J. Wong 
---
 tests/xfs/237 |  107 
 tests/xfs/237.out |   12 ++
 tests/xfs/239 |   98 
 tests/xfs/239.out |   11 +
 tests/xfs/240 |  109 +
 tests/xfs/240.out |   12 ++
 tests/xfs/241 |   99 
 tests/xfs/241.out |   11 +
 tests/xfs/group   |4 ++
 9 files changed, 463 insertions(+)
 create mode 100755 tests/xfs/237
 create mode 100644 tests/xfs/237.out
 create mode 100755 tests/xfs/239
 create mode 100644 tests/xfs/239.out
 create mode 100755 tests/xfs/240
 create mode 100644 tests/xfs/240.out
 create mode 100755 tests/xfs/241
 create mode 100644 tests/xfs/241.out


diff --git a/tests/xfs/237 b/tests/xfs/237
new file mode 100755
index 000..8288724
--- /dev/null
+++ b/tests/xfs/237
@@ -0,0 +1,107 @@
+#! /bin/bash
+# FS QA Test No. 237
+#
+# Test AIO DIO CoW behavior when the write temporarily fails.
+#
+#---
+# Copyright (c) 2016, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename "$0"`
+seqres="$RESULT_DIR/$seq"
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+cd /
+rm -rf "$tmp".* "$testdir" "$TEST_DIR/moo"
+_dmerror_cleanup
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/reflink
+. ./common/dmerror
+
+# real QA test starts here
+_supported_os Linux
+_require_scratch_reflink
+_require_cp_reflink
+_require_dm_target error
+_require_xfs_io_command "cowextsize"
+_require_aiodio "aiocp"
+
+rm -f "$seqres.full"
+
+
+echo "Format and mount"
+_scratch_mkfs > "$seqres.full" 2>&1
+_dmerror_init
+_dmerror_mount >> "$seqres.full" 2>&1
+
+testdir="$SCRATCH_MNT/test-$seq"
+rm -rf $testdir
+mkdir $testdir
+
+blksz=65536
+nr=640
+bsz=128
+
+free_blocks=$(stat -f -c '%a' "$testdir")
+real_blksz=$(stat -f -c '%S' "$testdir")
+space_needed=$(((blksz * nr * 3) * 5 / 4))
+space_avail=$((free_blocks * real_blksz))
+test $space_needed -gt $space_avail && _notrun "Not enough space. $space_avail 
< $space_needed"
+
+echo "Create the original files"
+"$XFS_IO_PROG" -c "cowextsize $((blksz * bsz * 2))" "$testdir"
+"$XFS_IO_PROG" -f -c "pwrite -S 0x61 -b $((blksz * bsz)) 0 $((blksz * nr))" 
"$testdir/file1" >> "$seqres.full"
+_cp_reflink "$testdir/file1" "$testdir/file2" >> "$seqres.full"
+_dmerror_unmount
+_dmerror_mount
+
+echo "Compare files"
+md5sum "$testdir/file1" | _filter_scratch
+md5sum "$testdir/file2" | _filter_scratch
+
+echo "CoW and unmount"
+"$XFS_IO_PROG" -f -c "pwrite -S 0x63 $((blksz * bsz)) 1" "$testdir/file2" >> 
"$seqres.full"
+"$XFS_IO_PROG" -f -c "pwrite -S 0x63 -b $((blksz * bsz)) 0 $((blksz * nr))" 
"$TEST_DIR/moo" >> "$seqres.full"
+sync
+_dmerror_load_error_table
+"$AIO_TEST" -f DIRECT -b $((blksz * bsz)) "$TEST_DIR/moo" "$testdir/file2" >> 
"$seqres.full"
+_dmerror_load_working_table
+_dmerror_unmount
+_dmerror_mount
+
+echo "Compare files"
+md5sum "$testdir/file1" | _filter_scratch
+md5sum "$testdir/file2" | _filter_scratch
+
+echo "Check for damage"
+_dmerror_unmount
+_dmerror_cleanup
+_repair_scratch_fs >> "$seqres.full" 2>&1
+
+# success, all done
+status=0
+exit
diff --git a/tests/xfs/237.out b/tests/xfs/237.out
new file mode 100644
index 000..c83dd8b
--- /dev/null
+++ b/tests/xfs/237.out
@@ -0,0 +1,12 @@
+QA output created by 237
+Format and mount
+Create the original files
+Compare files
+1886e67cf8783e89ce6ddc5bb09a3944  SCRATCH_MNT/test-237/file1
+1886e67cf8783e89ce6ddc5bb09a3944  SCRATCH_MNT/test-237/file2
+CoW and unmount
+write missed bytes expect 8388608 got 0
+Compare files
+1886e67cf8783e89ce6ddc5bb09a3944  SCRATCH_MNT/test-237/file1
+d94b0ab13385aba594411c174b1cc13c  SCRATCH_MNT/test-237/file2
+Check for damage
diff --git a/tests/xfs/239 b/tests/xfs/239
new file mode 100755
index 000..dfb1107
--- /dev/null
+++ b/tests/xfs/239
@@ -0,0 +1,98 @@
+#! /bin/bash
+# FS QA Test No. 239
+#
+# Test AIO DIO CoW behavior.
+#

[PATCH 22/23] xfs: test xfs_getbmapx behavior with shared extents

2016-02-08 Thread Darrick J. Wong
Make sure that xfs_getbmapx behaves properly w.r.t. shared extents
and CoW fork reporting.

Signed-off-by: Darrick J. Wong 
---
 common/xfs|   19 ++
 tests/xfs/243 |  169 +
 tests/xfs/243.out |   27 
 tests/xfs/245 |   99 +++
 tests/xfs/245.out |   13 
 tests/xfs/group   |2 +
 6 files changed, 329 insertions(+)
 create mode 100755 tests/xfs/243
 create mode 100644 tests/xfs/243.out
 create mode 100755 tests/xfs/245
 create mode 100644 tests/xfs/245.out


diff --git a/common/xfs b/common/xfs
index 2d1a76f..91b7916 100644
--- a/common/xfs
+++ b/common/xfs
@@ -42,3 +42,22 @@ _require_xfs_scratch_rmapbt()
fi
_scratch_unmount
 }
+
+_xfs_bmapx_find() {
+   case "$1" in
+   "attr")
+   param="a"
+   ;;
+   "cow")
+   param="c"
+   ;;
+   *)
+   param="e"
+   ;;
+   esac
+   shift
+   file="$1"
+   shift
+
+   "$XFS_IO_PROG" -c "bmap -${param}lpv" "$file" | grep -c "$@"
+}
diff --git a/tests/xfs/243 b/tests/xfs/243
new file mode 100755
index 000..a97e87e
--- /dev/null
+++ b/tests/xfs/243
@@ -0,0 +1,169 @@
+#! /bin/bash
+# FS QA Test No. 243
+#
+# Ensuring that copy on write in buffered mode works when the CoW
+# range originally covers multiple extents, some unwritten, some not.
+#   - Set cowextsize hint.
+#   - Create a file with the following repeating sequence of blocks:
+# 1. reflinked
+# 2. unwritten
+# 3. hole
+# 4. regular block
+# 5. delalloc
+#   - CoW across the halfway mark, starting with the unwritten extent.
+#   - Check that the files are now different where we say they're different.
+#
+#---
+# Copyright (c) 2016, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename "$0"`
+seqres="$RESULT_DIR/$seq"
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+cd /
+rm -rf "$tmp".*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/reflink
+. ./common/xfs
+
+# real QA test starts here
+_supported_os Linux
+_require_scratch_reflink
+_require_xfs_io_command "falloc"
+_require_xfs_io_command "fpunch"
+_require_cp_reflink
+
+rm -f "$seqres.full"
+
+echo "Format and mount"
+_scratch_mkfs > "$seqres.full" 2>&1
+_scratch_mount >> "$seqres.full" 2>&1
+
+testdir="$SCRATCH_MNT/test-$seq"
+rm -rf $testdir
+mkdir $testdir
+
+echo "Create the original files"
+blksz=65536
+nr=64
+real_blksz=$(stat -f -c '%S' "$testdir")
+internal_blks=$((blksz * nr / real_blksz))
+"$XFS_IO_PROG" -c "cowextsize $((blksz * 16))" "$testdir" >> "$seqres.full"
+_pwrite_byte 0x61 0 $((blksz * nr)) "$testdir/file1" >> "$seqres.full"
+$XFS_IO_PROG -f -c "truncate $((blksz * nr))" "$testdir/file3" >> 
"$seqres.full"
+# 0 blocks are reflinked
+seq 0 5 $nr | while read f; do
+   _reflink_range "$testdir/file1" $((blksz * f)) "$testdir/file3" 
$((blksz * f)) $blksz >> "$seqres.full"
+   _pwrite_byte 0x61 $((blksz * f)) $blksz "$testdir/file3.chk" >> 
"$seqres.full"
+done
+# 1 blocks are unwritten
+seq 1 5 $nr | while read f; do
+   $XFS_IO_PROG -f -c "falloc $((blksz * f)) $blksz" "$testdir/file3" >> 
"$seqres.full"
+   _pwrite_byte 0x00 $((blksz * f)) $blksz "$testdir/file3.chk" >> 
"$seqres.full"
+done
+# 2 blocks are holes
+seq 2 5 $nr | while read f; do
+   _pwrite_byte 0x00 $((blksz * f)) $blksz "$testdir/file3.chk" >> 
"$seqres.full"
+done
+# 3 blocks are regular
+seq 3 5 $nr | while read f; do
+   _pwrite_byte 0x71 $((blksz * f)) $blksz "$testdir/file3" >> 
"$seqres.full"
+   _pwrite_byte 0x71 $((blksz * f)) $blksz "$testdir/file3.chk" >> 
"$seqres.full"
+done
+sync
+# 4 blocks are delalloc (do later)
+seq 4 5 $nr | while read f; do
+   _pwrite_byte 0x62 $((blksz * f)) $blksz "$testdir/file3" >> 
"$seqres.full"
+   _pwrite_byte 0x62 $((blksz * f)) $blksz "$testdir/file3.chk" >> 
"$seqres.full"
+done
+
+# 10 blocks are cow'd
+seq 0 10 $((nr/2)) | while read f; do
+