Re: Is btrfs-convert able to deal with sparse files in a ext4 filesystem?

2017-04-01 Thread Duncan
Sean Greenslade posted on Sat, 01 Apr 2017 12:13:57 -0700 as excerpted:

> On Sat, Apr 01, 2017 at 11:48:50AM +0200, Kai Herlemann wrote:

>> I have on my ext4 filesystem some sparse files, mostly images from ext4
>> filesystems.
>> Is btrfs-convert (4.9.1) able to deal with sparse files or can that
>> cause any problems?
> 
> From personal experience, I would recommend not using btrfs-convert on
> ext4 partitions.

While I'd be extremely surprised if btrfs-convert didn't work on sparse 
files, since if it didn't it wouldn't be a general-purpose converter and 
thus wouldn't be suited to the purpose...

I must agree, tho on general principles, with Sean here, btrfs-convert 
isn't something I'd either use myself or recommend to others.  Consider:

1) Btrfs is considered on this list to be stabilizing, not fully stable 
and mature.  While in general (that is, even on stable and mature 
filesystems) the real value of your data can be defined by whether you 
care enough about it to have backups of that data -- if you don't, you 
self-evidently care less about that data than the time, resources and 
hassle you're saving by NOT doing the backup[1] -- on a still stabilizing 
filesystem such as btrfs, that applies even more strongly.  If you don't 
have a backup and aren't ready to use it if necessary, you really ARE 
declaring that data to be of less value than the time/hassle/resource 
cost of doing it.

2) It follows from #1 that (assuming you consider the data of reasonable 
value) you have backups, and are prepared to restore from them.  Which 
means you have the /space/ for that backup.

3) Which means there's very little reason to use a converter such as 
btrfs-convert, because you can just do a straightforward blow away the 
filesystem and restore from backup (or from the primaries or a secondary 
backup if it /is/ your backup).

4) In fact, since an in-place convert is almost certainly going to take 
more time than a blow-away and restore from backup, and the end result is 
pretty well guaranteed to be less optimally arranged in the new native 
format than a freshly created filesystem with data equally freshly copied 
over from backups or primary sources, there are pretty big reasons *NOT* 
to do an in-place convert.

5) And if you don't have current backups, then by creating a brand new 
btrfs in new space and copying over from your existing ext4, you 
"magically" create that recommended backup, since that ext4 can then be 
used as a backup for your new btrfs.  Of course you'll eventually need to 
update that backup, but meanwhile, it'll be a useful backup, should it be 
needed, while you're settling in on the new btrfs.  =:^)


Meanwhile, it can be noted that plain old cp has the -a/--archive option 
that makes using it for making and restoring backups easier, and it also 
has a --sparse option.  Back on reiserfs, I used to use the 
--sparse=always option for my backups here, without issue, tho on btrfs I 
use the compress (actually compress=lzo] mount option, which should 
compress sparse areas of files even if the files don't get created 
specifically as sparse files, so I don't worry about it on btrfs.

Tho if those ext4 images are to be actively used by VMs or are otherwise 
actively written to, on btrfs I'd consider using the nocow attribute for 
them, and it disables btrfs compression, so I'd consider sparse copying 
for them.  But that's an entirely different topic worthy of its own 
thread if your use-case requires it and you still have questions on it 
after doing your own research...

---
[1] Backup:  Note that a backup that hasn't been tested to be actually 
restorable isn't yet a backup, only a potential backup, as the job of 
making a backup isn't complete until that backup has been tested to be 
restorable.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mix ssd and hdd in single volume

2017-04-01 Thread Duncan
UGlee posted on Sat, 01 Apr 2017 14:06:11 +0800 as excerpted:

> We are working on a small NAS server for home user. The product is
> equipped with a small fast SSD (around 60-120GB) and a large HDD (2T to
> 4T).
> 
> We have two choices:
> 
> 1. using bcache to accelerate io operation 2. combining SSD and HDD into
> a single btrfs volume.
> 
> Bcache is certainly designed for our purpose. But bcache requires
> complex configuration and can only start from clean disk. Also in our
> test in Ubuntu 16.04, data inconsistence was observed at least once,
> resulting total HDD data lost.
> 
> So we wonder if simply putting SSD and HDD into a single btrfs volume,
> in whatever mode, the general read operation (mostly readdir and
> getxattr) will also be significantly faster than a single HDD without
> SSD.

At present, bcache, or possibly the lvmcache alternative, are the only 
recommended way of creating a single btrfs out of a mixed-size ssd/hdd 
multi-volume.

The problem is that while they've been considered, there's no present 
method of telling btrfs to use the smaller ssd for hotter content.  The 
btrfs chunk allocator simply doesn't have that option at present.

Which would leave you with the choice of single, raid1 or raid0 modes.  
Raid1 requires two copies on separate devices which would mean the extra 
space on the larger hdd would be wasted/unusable, and the read-mode 
mirror choice algorithm is purely even/odd PID-based so on single reads 
you'd have a 50% chance of fast ssd reads, 50% chance slow hdd.  With 
single mode the allocator allocates to the device with the most space 
available first, so until the free space equalized between the two, all 
chunks would end up on the larger/slower hdd.  And raid0 would allocate 
evenly (allocate-wide policy) to both, again wasting the extra space on 
the larger device while only giving you overall about the same speed as 
two hdds would give you, tho less predictably you'd get the full speed of 
the ssd.

The default two-device setup, FWIW, is raid1-mode metadata for safety, 
single-mode data.  

As you can see, none of those are ideal from a fast-small-ssd as cache to 
a large-slow-hdd perspective, thus the recommendation of bcache or 
lvmcache if that's what you want/need.

The other alternative, of course, is separate filesystems, using a 
combination of symlinks, partitioning and bind-mounts, to arrange for 
frequently accessed and performance-critical stuff such as root and /home 
to be on the smaller/faster ssd, while the larger/slower hdd is used for 
stuff like a user's multimedia partition/filesystem.  That's actually 
what I've done here and I'm *very* happy with the result, but it's the 
type of solution that must either be customized per-installation, or 
perhaps be setup by a special-purpose distro installer designed with that 
sort of use-case target in mind.  It's /not/ the sort of thing you can do 
in a NAS product and expect mass-market users to actually read and 
understand the docs in ordered to use the product in an optimal way.


Meanwhile, since you appear to be designing a mass-market product, it's 
worth mentioning that btrfs is considered, certainly by its devs and 
users on this list, to be "still stabilizing, not fully stable and 
mature."  As such, making and having backups at the ready for any data 
considered to be more valuable than the time and resources necessary to 
make those backups is strongly recommended, even more so than when the 
filesystem is considered stable and mature (tho certainly the rule 
applies even then, but try telling that to a mass-market user...).

Additionally, since btrfs /is/ still stabilizing, we recommend that users 
run relatively new kernels, we best support the latest kernels in either 
of the current kernel series (thus 4.10 and 4.9 at present) or the 
mainline LTS series (thus 4.9 and 4.4 at present), and further recommend 
that users at least loosely follow the list in ordered to keep up with 
current btrfs developments and possible issues they may confront.

That doesn't sound like a particularly good choice for a mass-market NAS 
product to me.  Of course there's rockstor and others out there already 
shipping such products, but they're risking their reputation and the 
safety of their customer's data in the process, altho there's certainly a 
few customers out there with the time, desire and technical know-how to 
ensure the recommended backups and following current kernel and list, and 
that's exactly the sort of people you'll find already here.  But that's 
not sufficiently mass-market to appeal to most vendors.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Btrfs Heatmap - v6

2017-04-01 Thread Hans van Kranenburg
Hi,

A few days ago, I tagged v6 of the Btrfs Heatmap utility, which
visualizes the usage of your btrfs filesystem:

https://github.com/knorrie/btrfs-heatmap

The main change is adapting to the python 3 only state of python-btrfs
v6. There's no functional difference between v5.

And... like python-btrfs, the btrfs-heatmap package is in the NEW queue
of Debian! W00t! Thanks again, kilobyte.

https://ftp-master.debian.org/new/btrfs-heatmap_6-1.html

Have fun! And, share some of your results, if you have nice pictures or
create timelapses of filesystems behaving or misbehaving. :)

-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


python-btrfs v6: python 3 only + what now?

2017-04-01 Thread Hans van Kranenburg
A few days ago, I tagged v6 of the python btrfs library:

https://github.com/knorrie/python-btrfs

== CHANGES ==

python-btrfs v6, Mar 24 2017
  * Only Python 3 supported from now on
  * IOCTLs: INO_LOOKUP, LOGICAL_INO, TREE_SEARCH_V2, IOC_BALANCE_V2,
IOC_BALANCE_CTL, BALANCE_PROGRESS
  * Data structures: InodeRef, DirItem, DirIndex, XAttrItem,
FileExtentItem, InodeExtref
  * Add a helper to retrieve free space tree extents
  * Check device error counters in the nagios plugin
  * Fixes:
- Not loading backreferences for extents was broken
- Handle IOCTL differences for different architectures
  * Examples added:
- Show directory contents (both the index and namehash list)
- Try to show a filename for data extents
- Show file information (inode, xattrs, etc, and list of extents)
- Show subvolumes

As usual, there's a wealth of information about everything that has been
added or changed in the git commit messages.

The biggest change this time is going python 3 only. Doing this really
improved the quality of my life. Trying to be 2 and 3 compatible was a
nice experiment, but having to test everything twice, playing whack a
mole schmack a mole all the time and living in the worst of both worlds
is not the best place to be. It was actually the reason that there was
no february release of python-btrfs. After making the decision I got the
thing going on again.

My favourites:
 * The SEARCH V2 IOCTL
https://github.com/knorrie/python-btrfs/commit/28e10d2495bde44805c1e09393bd4ea82b018409
 * Holy sh*t over 9000 faster OMG wheee yolo!!!
https://github.com/knorrie/python-btrfs/commit/9b6407d164eaa79ffe0494fa0ec82632128c783f
 * The balance IOCTL family
https://github.com/knorrie/python-btrfs/commit/f37eebcda1d4a52b58868007fc09099198177135

== Debian packages! ==

And... the debian packages are currently in the NEW queue of Debian!
https://ftp-master.debian.org/new/python-btrfs_6-2.html
Big thanks to kilobyte for acting as mentor for my uploads! This means
that if accepted, we'll have it in unstable, and after the Stretch
release RSN, we can probably mirror all new versions in stretch-backports.

== What now? ==

Currently, I'm still researching free space fragmentation and how well
(*ehrm* how bad) btrfs handles reusing free space in allocated chunks.
Some progress is being made, many eyebrows were raised about ssd,nossd
mount options and I'll be adding an example soon that shows how to use
the new balance functions to feed block groups with bad free space
fragmentation patterns to balance in an effective way.

Besides that...

One of the things I learned when working on this project for the past
year is that development of features only gets done quick and properly
when there's an actual use case that needs solving using it. The
btrfs-heatmap tool is a very good example of this.

Now, I'm wondering what the next thing to do will be:

* Learning how to build C extensions for python, to integrate
btrfs-progs C code and be able to do crazy things really easily from python?
* Starting implementation of offline access to filesystems, to be able
to explore an unmountable or quite unusable filesystem (bad leaf! bad
node! bad key order!) and be able to interactively repairing things that
btrfschk cannot repair right now, or, which cannot easily be repaired by
an automated program because it needs some human reasoning to get the
bytes back in the right place...
* Or, start writing documentation. I tend to be able to write quite good
documentation, but it's a really time consuming thing to do. Usually
when I write documentation and ask for feedback, there's none, which
either means it's perfect, or it's not being read by anyone. :D

Actually, starting to write documentation is the one I think I'd like to
work on first. This page keeps being one of my favourites:
http://lartc.org/howto/lartc.iproute2.explore.html -> "ip shows us our
links"... this page made me learn networking many years ago. I'd like to
write documentation about btrfs in this way. Just start somewhere, start
exploring and show the reader what metadata trees are, how they're
organized, where you can find back your subvolumes and files, etc... And
of course show working code that can display everything using only a few
lines of python.

I'm not perfectly sure about the target audience, but it would probably
be the curious sysadmin user, who's also a bit of a programmer, and who
wants to learn more about the inner workings of his/her btrfs
filesystem. And... of course the ones that just want to do something
instead of getting sick of building regexes to parse human readable
output of other tools!

And when the reader gets more interested and starts reading kernel
source code, all of the struct names, constants etc will make instant
sense, because I'm keeping them as similar as possible as the C code.

What would you think/want?

Wait, is anyone actually using this besides myself? (I have no idea...)

Moo,

-- 
Hans van 

Re: force btrfs to release underlying block device(s)

2017-04-01 Thread Duncan
Glenn Washburn posted on Sat, 01 Apr 2017 00:58:19 -0500 as excerpted:

> I've run into a frustrating problem with a btrfs volume just now.  I
> have a USB drive which has many partitions, two of which are luks
> encrypted, which can be unlocked as a single, multi-device btrfs volume.
>  For some reason the drive logically disconnected at the USB protocol
> level, but not physically.  Then it reconnected.  This caused the mount
> point to be removed at the vfs layer, however I could not close the luks
> devices.
> 
> When looking in /sys/fs/btrfs, I see a directory with the UUID of the
> offending volume, which shows the luks devices under the devices
> directory.  So I presume the btrfs module is still holding references to
> the block devices, not allowing them to be closed.  I know I can do a
> "dmsetup remove --force" to force closing the luks devices, but I doubt
> that will cause the btrfs module to release the offending block devices.
>  So if I do that and then open the luks devices again and try to remount
> the btrfs volume, I'm guessing insanity will ensue.
> 
> I can't unload/reload the btrfs module because the root fs among others
> are using it.  Obviously, I can reboot, but that's a windows solution.
> Anyone have a solution to this issue?  Is anyone looking into ways to
> prevent this from happening?  I think this situation should be trivial
> to reproduce.

Short answer: This is yet another known point supporting "btrfs is still 
stabilizing and under heavy development, not fully stable and mature."

Longer...

This is a known issue on current btrfs.  ATM, btrfs has no notion of 
device disappearance -- it keeps trying to write updates to physically or 
lower-level-logically missing devices "forever", or at least until btrfs 
triggers an emergency read-only remount on the filesystem, but that 
doesn't free the device or dirty memory.

There are patches available as part of the global hot-spare patchset that 
give btrfs the notion of a dead device, so the hot-spare stuff can 
trigger auto-replacement with a hot-spare, but that's a long-term-merge-
target patchset that is currently back-burnered, with (AFAIK) no mainline 
merge target kernel in sight.  Meanwhile, from list posts it seems that 
patchset has bit-rotted and no longer applies as-is to current kernels.

So obviously the problem is known and will eventually be addressed, but 
just when is anyone's guess.  It's quite unlikely to be in the next 2-3 
kernel series, however, and could be several years out, altho the fact 
that someone had enough interest in it to create the patchset in the 
first place means it's reasonably likely to be seen within the 1-5 year 
timeframe, unlike wishlist items that don't even have RFC-level patches 
yet.

Tho another part of that patchset, the per-chunk availability check for 
degraded filesystems that allows writable mount of multi-device 
filesystems with single chunks, etc, as long as all chunks are available, 
has seen renewed activity recently as the problem it addresses, formerly 
two-device raid1 filesystems going read-only after one degraded-writable 
mount, has become an increasingly frequently list-reported problem.  That 
smaller patchset has I believe now been review and is I believe now in 
btrfs-next, scheduled for merge in 4.12.

That will definitely make the global-hot-spare patchset smaller and 
easier to (eventually) merge, as this part will have already been merged 
and thus no longer needs to be part of the global-hot-spare patchset.  
Conceivably, the device-tracking patches could similarly be broken out 
into a smaller patchset of their own, but without anything actually 
actively using them for anything, testing would be more difficult, and 
it's unclear they'd be separately merged.

But provided the per-chunk-availability check is merged in 3.12, it would 
move up my gut-feeling prediction on the global-hot-spare patchset it was 
part of a bit, to say 9 months to 3.5 years, from the otherwise 1-5 years 
prediction.

Of course as we've seen with the raid56 functionality, mainline merge 
doesn't necessarily mean it'll actually be usably stable any time soon.  
Most new features take at least a couple kernel cycles to stabilize after 
mainline merge, and a few, like raid56, take far longer and may never 
stabilize at least in anything close to original merge form.

IOW, patience is a virtue, particularly if you're not a kernel-level dev 
and thus can't really do much to help it along yourself, other than 
working with the devs to test once it's on the active merge schedule and 
after merge, to hopefully bring faster usable stability.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization

2017-04-01 Thread Dave Chinner
On Thu, Mar 30, 2017 at 12:12:31PM -0400, J. Bruce Fields wrote:
> On Thu, Mar 30, 2017 at 07:11:48AM -0400, Jeff Layton wrote:
> > On Thu, 2017-03-30 at 08:47 +0200, Jan Kara wrote:
> > > Because if above is acceptable we could make reported i_version to be a 
> > > sum
> > > of "superblock crash counter" and "inode i_version". We increment
> > > "superblock crash counter" whenever we detect unclean filesystem shutdown.
> > > That way after a crash we are guaranteed each inode will report new
> > > i_version (the sum would probably have to look like "superblock crash
> > > counter" * 65536 + "inode i_version" so that we avoid reusing possible
> > > i_version numbers we gave away but did not write to disk but still...).
> > > Thoughts?
> 
> How hard is this for filesystems to support?  Do they need an on-disk
> format change to keep track of the crash counter?

Yes. We'll need version counter in the superblock, and we'll need to
know what the increment semantics are. 

The big question is how do we know there was a crash? The only thing
a journalling filesystem knows at mount time is whether it is clean
or requires recovery. Filesystems can require recovery for many
reasons that don't involve a crash (e.g. root fs is never unmounted
cleanly, so always requires recovery). Further, some filesystems may
not even know there was a crash at mount time because their
architecture always leaves a consistent filesystem on disk (e.g. COW
filesystems)

> I wonder if repeated crashes can lead to any odd corner cases.

WIthout defined, locked down behavour of the superblock counter, the
almost certainly corner cases will exist...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is btrfs-convert able to deal with sparse files in a ext4 filesystem?

2017-04-01 Thread Sean Greenslade
On Sat, Apr 01, 2017 at 11:48:50AM +0200, Kai Herlemann wrote:
> Hi,
> I have on my ext4 filesystem some sparse files, mostly images from
> ext4 filesystems.
> Is btrfs-convert (4.9.1) able to deal with sparse files or can that
> cause any problems?

>From personal experience, I would recommend not using btrfs-convert on
ext4 partitions. I attempted it on a /home partition on one of my
machines, and while it did succeed in converting, the fs it produced had
weird issues that caused transation failures and thus semi-frequent
remount-ro. Btrfs-check, scrub, and balance were all unable to repair
the damage. I ended up recreating the parition from a backup.

As far as I know, there were no sparse files on this partition, either.

Just my one data point, for whatever it's worth.

--Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: is send/receive

2017-04-01 Thread Hugo Mills
On Sat, Apr 01, 2017 at 05:25:06PM +0200, Lukas Tribus wrote:
> Hello experts,
> 
> 
> quick question about btrfs send/receive:
> 
> Is btrfs send/receive is prone to cause destination filesystem
> corruption/failure, when the source file system is bogus (due to
> bugs, or due to other factors like memory bit-flips happening, both
> *within* the source file system)?
> 
> Or asked differently: does send/receive transfer metadata/tree
> (which may be corrupt) from source to destination?

   Not at that level, no.

   A send stream consists of a sequence of FS operations (copy, mkdir,
reflink, mv, snapshot, etc) which can be used to create a subvolume
which is logically equivalent to the source subvolume. Thus, if there
are broken invariants in the metadata of the source FS, they cannot be
recreated in the destination FS, because you can't create broken
metadata using those operations.

> I would like to know if send/receive is a *completely* appropriate
> to backup data to a destination btrfs filesystem, or if there is a -
> even a one in a million - chance it may corrupt the destination FS.
> Would it be preferable to use a non-btrfs destination FS for backup
> purposes? What do you guys think?

   If you have corruption which results in otherwise valid POSIX
metadata (say, somehow a UID got changed, or some permissions bits got
set into a silly but valid configuration), then that could be
transferred. Corruption leading to a "broken" filesystem (say, an
undeletable directory), no.

> Also, if I understand correctly, checksum are not transferred
> through send/receive, therefor a corruption while transferring is
> possible (just like with rsync), right?

   Correct.

   Hugo.

-- 
Hugo Mills | Dullest spy film ever: The Eastbourne Ultimatum
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4  |   The Thick of It


signature.asc
Description: Digital signature


is send/receive

2017-04-01 Thread Lukas Tribus

Hello experts,


quick question about btrfs send/receive:

Is btrfs send/receive is prone to cause destination filesystem 
corruption/failure, when the source file system is bogus (due to bugs, 
or due to other factors like memory bit-flips happening, both *within* 
the source file system)?


Or asked differently: does send/receive transfer metadata/tree (which 
may be corrupt) from source to destination?



I would like to know if send/receive is a *completely* appropriate to 
backup data to a destination btrfs filesystem, or if there is a - even a 
one in a million - chance it may corrupt the destination FS. Would it be 
preferable to use a non-btrfs destination FS for backup purposes? What 
do you guys think?



Also, if I understand correctly, checksum are not transferred through 
send/receive, therefor a corruption while transferring is possible (just 
like with rsync), right?




Thanks,

Lukas



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-04-01 Thread Peter Grandi
[ ... ]

>>>   $  D='btrfs f2fs gfs2 hfsplus jfs nilfs2 reiserfs udf xfs'
>>>   $  find $D -name '*.ko' | xargs size | sed 's/^  *//;s/ .*\t//g'
>>>   textfilename
>>>   832719  btrfs/btrfs.ko
>>>   237952  f2fs/f2fs.ko
>>>   251805  gfs2/gfs2.ko
>>>   72731   hfsplus/hfsplus.ko
>>>   171623  jfs/jfs.ko
>>>   173540  nilfs2/nilfs2.ko
>>>   214655  reiserfs/reiserfs.ko
>>>   81628   udf/udf.ko
>>>   658637  xfs/xfs.ko

That was Linux AMD64.

> udf is 637K on Mac OS 10.6
> exfat is 75K on Mac OS 10.9
> msdosfs is 79K on Mac OS 10.9
> ntfs is 394K (That must be Paragon's ntfs for Mac)
...
> zfs is 1.7M (10.9)
> spl is 247K (10.9)

Similar on Linux AMD64 but smaller:

  $ size updates/dkms/*.ko | sed 's/^  *//;s/ .*\t//g'
  textfilename
  62005   updates/dkms/spl.ko
  184370  updates/dkms/splat.ko
  3879updates/dkms/zavl.ko
  22688   updates/dkms/zcommon.ko
  1012212 updates/dkms/zfs.ko
  39874   updates/dkms/znvpair.ko
  18321   updates/dkms/zpios.ko
  319224  updates/dkms/zunicode.ko

> If they are somehow comparable even with the differences, 833K
> is not bad for btrfs compared to zfs. I did not look at the
> format of the file; it must be binary, but compression may be
> optional for third party kexts. So the kernel module sizes are
> large for both btrfs and zfs. Given the feature sets of both,
> is that surprising?

Not surprising and indeed I agree with the statement that
appeared earlier that "there are use cases that actually need
them". There are also use cases that need realtime translation
of file content from chinese to spanish, and one could add to
ZFS or Btrfs an extension to detect the language of text files
and invoke via HTTP Google Translate, for example with option
"translate=chinese-spanish" at mount time; or less flexibly
there are many use cases where B-Tree lookup of records in files
is useful, and it would be possible to add that to Btrfs or ZFS,
so that for example 'lseek(4,"Jane Smith",SEEK_KEY)' would be
possible, as in the ancient TSS/370 filesystem design.

But the question is about engineering, where best to implement
those "feature sets": in the kernel or higher levels. There is
no doubt for me that realtime language translation and seeking
by key can be added to a filesystem kernel module, and would
"work". The issue is a crudely technical one: "works" for an
engineer is not a binary state, but a statistical property over
a wide spectrum of cost/benefit tradeoffs.

Adding "feature sets" because "there are use cases that actually
need them" is fine, adding their implementation to the kernel
driver of a filesystem is quite a different proposition, which
may have downsides, as the implementations of those feature sets
may make code more complex and harder to understand and test,
never mind debug, even for the base features. But of course lots
of people know better :-).

Buit there is more; look again at some compiled code sizes as a
crude proxy for complexity, divided in two groups, both of
robust, full featured designs:

  1012212 updates/dkms/zfs.ko
  832719  btrfs/btrfs.ko
  658637  xfs/xfs.ko

  237952  f2fs/f2fs.ko
  173540  nilfs2/nilfs2.ko
  171623  jfs/jfs.ko
  81628   udf/udf.ko

The code size for JFS or NILFS2 or UDF is roughly 1/4 the code
size for XFS, yet there is little difference in functionality.
Compared to ZFS as to base functionality JFS lacks checksums and
snapshots (in theory it has subvolumes, but they are disabled),
but NILFS2 has snapshots and checksums (but does not verify them
on ordinary reads), and yet the code size is 1/6 that of ZFS.
ZFS has also RAID, but looking at the code size of the Linux MD
RAID modules I see rather smaller numbers. Even so ZFS has a
good reputation for reliability despire its amazing complexity,
but that is also because SUN invested big into massive release
engineering for it, and similarly for XFS.

Therefore my impression is that the filesystems in the first
group have a lot of cool features like compression or dedup
etc. that could have been implemented user-level, and having
them in the kernel is good "for "marketing" purposes, to win
box-ticking competitions".
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Do different btrfs volumes compete for CPU?

2017-04-01 Thread Peter Grandi
>> Approximately 16 hours ago I've run a script that deleted
>> >~100 snapshots and started quota rescan on a large
>> USB-connected btrfs volume (5.4 of 22 TB occupied now).

That "USB-connected is a rather bad idea. On the IRC channel
#Btrfs whenever someone reports odd things happening I ask "is
that USB?" and usually it is and then we say "good luck!" :-).

The issues are:

* The USB mass storage protocol is poorly designed in particular
  for error handling.
* The underlying USB protocol is very CPU intensive.
* Most importantly nearly all USB chipsets, both system-side
  and peripheral-side, are breathtakingly buggy, but this does
  not get noticed for most USB devices.

>> Quota rescan only completed just now, with 100% load from
>> [btrfs-transacti] throughout this period,

> [ ... ] are different btrfs volumes independent in terms of
> CPU, or are there some shared workers that can be point of
> contention?

As written that question is meaningless: despite the current
mania for "threads"/"threadlets" a filesystem driver is a
library, not a set of processes (all those '[btrfs-*]'
threadlets are somewhat misguided ways to do background
stuff).

The real problems here are:

* Qgroups are famously system CPU intensive, even if less so
  than in earlier releases, especially with subvolumes, so the
  16 hours CPU is both absurd and expected. I think that qgroups
  are still effectively unusable.
* The scheduler gives excessive priority to kernel threads, so
  they can crowd out user processes. When for whatever reason
  the system CPU percentage rises everything else usually
  suffers.

> BTW, USB adapter used is this one (though storage array only
> supports USB 3.0):
> https://www.asus.com/Motherboard-Accessory/USB_31_TYPEA_CARD/

Only Intel/AMD USB chipsets and a few others are fairly
reliable, and for mass storage only with USB3 with UASPI, which
is basically SATA-over-USB (more precisely SCSI-command-set over
USB). Your system-side card seems to be recent enough to do
UASPI, but probably the peripheral-side chipset isn't. Things
are so bad with third-party chipsets that even several types of
add-on SATA and SAS cards are too buggy.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Is btrfs-convert able to deal with sparse files in a ext4 filesystem?

2017-04-01 Thread Kai Herlemann
Hi,
I have on my ext4 filesystem some sparse files, mostly images from
ext4 filesystems.
Is btrfs-convert (4.9.1) able to deal with sparse files or can that
cause any problems?

Thanks in advance,
Kai
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-04-01 Thread Kai Krakow
Am Mon, 27 Mar 2017 20:06:46 +0500
schrieb Roman Mamedov :

> On Mon, 27 Mar 2017 16:49:47 +0200
> Christian Theune  wrote:
> 
> > Also: the idea of migrating on btrfs also has its downside - the
> > performance of “mkdir” and “fsync” is abysmal at the moment. I’m
> > waiting for the current shrinking job to finish but this is likely
> > limited to the “find free space” algorithm. We’re talking about a
> > few megabytes converted per second. Sigh.  
> 
> Btw since this is all on LVM already, you could set up lvmcache with
> a small SSD-based cache volume. Even some old 60GB SSD would work
> wonders for performance, and with the cache policy of "writethrough"
> you don't have to worry about its reliability (much).

That's maybe the best recommendation to speed things up. I'm using
bcache here for the same reasons (speeding up random workloads) and it
works wonders.

Tho, for such big storage I'd maybe recommend a bigger SSD and a new
one. Bigger SSDs tend to last much longer. Just don't use the whole of
it to allow for better wear leveling and you'll get a final setup that
can serve the system much longer than for the period of migration.

-- 
Regards,
Kai

Replies to list-only preferred.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: backing up a file server with many subvolumes

2017-04-01 Thread Kai Krakow
Am Mon, 27 Mar 2017 08:57:17 +0300
schrieb Marat Khalili :

> Just some consideration, since I've faced similar but no exactly same 
> problem: use rsync, but create snapshots on target machine. Blind
> rsync will destroy deduplication of your snapshots and take huge
> amount of storage, so it's not a solution. But you can rsync --inline
> your snapshots in chronological order to some folder and re-take
> snapshots of that folder, thus recreating your snapshots structure on
> target. Obviously, it can/should be automated.

I think it's --inplace and --no-whole-file...

Apparently, rsync cannot detect moved files which was a big deal for me
regarding deduplication, so I found another solution which is even
faster. See my other reply.

> On 26/03/17 06:00, J. Hart wrote:
> > I have a Btrfs filesystem on a backup server.  This filesystem has
> > a directory to hold backups for filesystems from remote machines.
> > In this directory is a subdirectory for each machine.  Under each
> > machine subdirectory is one directory for each filesystem
> > (ex /boot, /home, etc) on that machine.  In each filesystem
> > subdirectory are incremental snapshot subvolumes for that
> > filesystem.  The scheme is something like this:
> >
> > /backup///
> >
> > I'd like to try to back up (duplicate) the file server filesystem 
> > containing these snapshot subvolumes for each remote machine.  The 
> > problem is that I don't think I can use send/receive to do this. 
> > "Btrfs send" requires "read-only" snapshots, and snapshots are not 
> > recursive as yet.  I think there are too many subvolumes which
> > change too often to make doing this without recursion practical.
> >
> > Any thoughts would be most appreciated.
> >
> > J. Hart
> >
> > -- 
> > To unsubscribe from this list: send the line "unsubscribe
> > linux-btrfs" in the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-btrfs" in the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: backing up a file server with many subvolumes

2017-04-01 Thread Kai Krakow
Am Mon, 27 Mar 2017 07:53:17 -0400
schrieb "Austin S. Hemmelgarn" :

> > I'd like to try to back up (duplicate) the file server filesystem
> > containing these snapshot subvolumes for each remote machine.  The
> > problem is that I don't think I can use send/receive to do this.
> > "Btrfs send" requires "read-only" snapshots, and snapshots are not
> > recursive as yet.  I think there are too many subvolumes which
> > change too often to make doing this without recursion practical.
> >
> > Any thoughts would be most appreciated.  
> In general, I would tend to agree with everyone else so far if you
> have to keep your current setup.  Use rsync with the --inplace option
> to send data to a staging location, then snapshot that staging
> location to do the actual backup.
> 
> Now, that said, I could probably give some more specific advice if I
> had a bit more info on how you're actually storing the backups.
> There are three general ways you can do this with BTRFS and
> subvolumes: 1. Send/receive of snapshots from the system being backed
> up. 2. Use some other software to transfer the data into a staging
> location on the backup server, then snapshot that.
> 3. Use some other software to transfer the data, and have it handle 
> snapshots instead of using BTRFS, possibly having it create
> subvolumes instead of directories at the top level for each system.

If you decide for (3), I can recommend borgbackup. It allows variable
block size deduplication across all backup sources, tho to fully get
that potential, your backups can only be done serially not in parallel.
Borgbackup cannot access the same repository with two processes in
parallel, and deduplication is only per repository.

Another recommendation for backups is the 3-2-1 rule:

  * have at least 3 different copies of your data (that means, your
original data, the backup copy, and another backup copy, separated
in a way they cannot fail for the same reason)
  * use at least 2 different media (that also means: don't backup
btrfs to btrfs, and/or use 2 different backup techniques)
  * keep at least 1 external copy (maybe rsync to a remote location)

The 3 copy rule can be deployed by using different physical locations,
different device types, different media, and/or different backup
programs. So it's kind of entangled with the 2 and 1 rule.

-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mix ssd and hdd in single volume

2017-04-01 Thread UGlee
We are working on a small NAS server for home user. The product is
equipped with a small fast SSD (around 60-120GB) and a large HDD (2T
to 4T).

We have two choices:

1. using bcache to accelerate io operation
2. combining SSD and HDD into a single btrfs volume.

Bcache is certainly designed for our purpose. But bcache requires
complex configuration and can only start from clean disk. Also in our
test in Ubuntu 16.04, data inconsistence was observed at least once,
resulting total HDD data lost.

So we wonder if simply putting SSD and HDD into a single btrfs volume,
in whatever mode, the general read operation (mostly readdir and
getxattr) will also be significantly faster than a single HDD without
SSD.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html