date:20160915

Re: [RFC] Preliminary BTRFS Encryption

2016-09-15 Thread Roman Mamedov

On Fri, 16 Sep 2016 11:12:13 +1000
Dave Chinner  wrote:

> > As of now these patch set supports encryption on per subvolume, as
> > managing properties on per subvolume is a kind of core to btrfs, which is
> > easier for data center solution-ing, seamlessly persistent and easy to
> > manage.
> 
> We've got dmcrypt for this sort of transparent "device level"
> encryption. Do we really need another btrfs layer that re-implements
> generic, robust, widely deployed, stable functionality?

"Btrfs subvolume-level" is far from "device-level", subvolumes are so
lightweight and dynamic that they are akin to regular directories for most
intents and purposes, not devices or partitions.

And yes I'd say (effectively) a directory-level encryption in an FS can be
useful; for example encrypting /home, but not the rest of the filesystem, or
any other scenarios where only some of the stored data needs to be encrypted,
and it's not known in advance what proportion, so it's not convenient to have
any static partition or LVM based bounds.

Currently this can be achieved with tools like encfs or ecryptfs -- so it's
those you'd want to measure Btrfs encryption against, not dmcrypt.

-- 
With respect,
Roman

pgpDncVUCrA04.pgp
Description: OpenPGP digital signature

Re: df -i shows 0 inodes 0 used 0 free on 4.4.0-36-generic Ubuntu 14 - Bug or not?

2016-09-15 Thread Duncan

GWB posted on Thu, 15 Sep 2016 18:58:24 -0500 as excerpted:

> I don't expect accurate data on a btrfs file system when using df, but
> after upgrading to kernel 4.4.0 I get the following:
> 
> $ df -i ...
> /dev/sdc3   0   0  0 - /home
> /dev/sdc4   0   0  0 - /vm0 ...
> 
> Where /dev/sdc3 and /dev/sdc4 are btrfs filesystems.
> 
> So is this a bug or not?

Not a bug.

Btrfs uses inodes, but unlike ext*, it creates them dynamically as-
needed, so showing inodes used vs. free simply makes no sense in btrfs 
context.

Now btrfs /does/ track data and metadata separately, creating chunks of 
each type, and it /is/ possible to have all otherwise free space already 
allocated to chunks of one type or the other and then run out of space in 
the one type of chunk while there's plenty of space in the other type of 
chunk, but that's quite a different concept, and btrfs fi usage (tho your 
v3.14 btrfs-progs will be too old for usage) or btrfs fi df coupled with 
btrfs fi show (the old way to get the same info), gives the information 
for that.

And in fact, the btrfs fi show for vm0 says 374.66 GiB size and used, so 
indeed, all space on that one is allocated.  Unfortunately you don't post 
the btrfs fi df for that one, so we can't tell where all that allocated 
space is going and whether it's actually used, but it's all allocated.  
You probably want to run a balance to get back some unallocated space.

Meanwhile, your kernel is 4.4.x LTS series so not bad there, but your 
userspace is extremely old, 3.12, making support a bit hard as some of 
the commands have changed (btrfs fi usage, for one, and I think the 
checker was still btrfsck in 3.12, while in current btrfs-progs, it's 
btrfs check).  I'd suggest updating that to at least something around the 
4.4 level to match the kernel, tho you can upgrade to the latest 4.7.2 
(don't try 4.6 or 4.7 previous to 4.7.2, or don't btrfs check --repair if 
you do, as there's a bug with it in those versions that's fixed in 4.7.2) 
if you like, as newer userspace is designed to work with older kernels as 
well.

Besides which, while old btrfs userspace isn't a big deal (other than 
translating back and forth between old style and new style commands) when 
your filesystems are running pretty much correctly, as in that case all 
userspace is doing in most cases is calling the kernel to do the real 
work anyway, it becomes a much bigger deal when something goes wrong, 
because it's userspace code that's executing with btrfs check or btrfs 
restore, and newer userspace knows about and can fix a LOT more problems 
than the really ancient 3.12.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Thoughts on btrfs RAID-1 for cold storage/archive?

2016-09-15 Thread Duncan

E V posted on Thu, 15 Sep 2016 11:48:13 -0400 as excerpted:

> I'm investigating using btrfs for archiving old data and offsite
> storage, essentially put 2 drives in btrfs RAID-1, copy the data to the
> filesystem and then unmount, remove a drive and take it to an offsite
> location. Remount the other drive -o ro,degraded until my systems slots
> fill up, then remove the local drive and put it on a shelf. I'd verify
> the file md5sums after data is written to the drive for piece of mind,
> but maybe a btrfs scrub would give the same assurances. Seem
> straightforward? Anything to look out for? Long term format stability
> seems good, right? Also, I like the idea of being able to pull the
> offsite drive back and scrub if the local drive ever has problems, a
> nice extra piece of mind we wouldn't get with ext4. Currently using the
> 4.1.32 kernel since the driver for the r750 card in our 45 drives system
> only supports up to 4.3 ATM.

As described I believe it should work fine.

Btrfs raid1 isn't like normal raid1 in some ways and in particular isn't 
designed to be mounted degraded, writable, long term, only temporarily, 
in ordered to replace a bad device.  As that's what I thought you were 
going to propose when I read the subject line, I was all ready to tell 
you no, don't try it and expect it to work, but of course you had 
something different in mind, only read-only mounting of the degraded 
raid1 (unless needed for scrub, etc), not mounting it writable, and as 
long as you are careful to do just that, only mount it read-only, you 
should be fine.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: unable to handle kernel paging request

2016-09-15 Thread Duncan

Mark Gavalda posted on Thu, 15 Sep 2016 22:12:57 +0200 as excerpted:

[Moved to bottom to retain quote/reply order.]

> On Thu, Sep 15, 2016 at 6:05 PM, Chris Mason  wrote:
>> On 09/15/2016 10:08 AM, Mark Gavalda wrote:
>>>
>>> Hi,
>>>
>>> Bumped into the following one today; kernel 4.4.0-36-generic Ubuntu
>>> 16.4.1; CPU went to 100% and only a hard restart solved the issue.
>>> Since then everything's back to normal.
>>>
>>> Please let me know how can I help get to the bottom of this?
>>
>>
>> I saw similar traces when tracking down this bug:
>>
>> https://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/
commit/?h=for-linus-4.8=cbd60aa7cd17d81a434234268c55192862147439
>>
>> It's flagged for stable, so you'll get it with the next stable update,
>> or you can apply it by hand and rebuild.

> Thanks, I can see it included in 4.8-rc6 but not the other branches.
> Will it get pulled later or is this a 4.8 only fix?

Flagged for stable means it's headed to the maintained stable branches 
(well, the ones to which the fix applies for regression fixes, but 
definitely 4.4 LTS series in this case since Chris Mason indicated it 
should apply in your case), not just current.

But stabilization policy says a patch must hit mainline current first, 
before it is eligible for older stable as well.  So it would be 
/expected/ to hit 4.8-rc, current development, first.  After that, given 
that it's already flagged for stable, it should eventually hit all the 
stable kernels to which it applies as well.  That can be right away, but 
if the stable maintainer (Greg K-H, normally) is backlogged due to just 
getting back from vacation or something, as sometimes happens, it can 
take a few weeks to work thru the backlog, so it can take a bit, as well.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Preliminary BTRFS Encryption

2016-09-15 Thread Dave Chinner

On Tue, Sep 13, 2016 at 09:39:46PM +0800, Anand Jain wrote:
> 
> This patchset adds btrfs encryption support.
> 
> The main objective of this series is to have bugs fixed and stability.
> I have verified with fstests to confirm that there is no regression.
> 
> A design write-up is coming next, however here below is the quick example
> on the cli usage. Please try out, let me know if I have missed something.

Yup, that best practices say "do not roll your own encryption
infrastructure".

This is just my 2c worth - take it or leave it, don't other flaming.
Keep in mind that I'm not picking on btrfs here - I asked similar
hard questions about the proposed f2fs encryption implementation.
That was a "copy and snowflake" version of the ext4 encryption code -
they made changes and now we have generic code and common
functionality between ext4 and f2fs.

> Also would like to mention that a review from the security experts is due,
> which is important and I believe those review comments can be accommodated
> without major changes from here.

That's a fairly significant red flag to me - security reviews need
to be done at the design phase against specific threat models -
security review is not a code/implementation review...

The ext4 developers got this right by publishing threat models and
design docs, which got quite a lot of review and feedback before
code was published for review.

https://docs.google.com/document/d/1ft26lUQyuSpiu6VleP70_npaWdRfXFoNnB8JYnykNTg/edit#heading=h.qmnirp22ipew

[small reorder of comments]

> As of now these patch set supports encryption on per subvolume, as
> managing properties on per subvolume is a kind of core to btrfs, which is
> easier for data center solution-ing, seamlessly persistent and easy to
> manage.

We've got dmcrypt for this sort of transparent "device level"
encryption. Do we really need another btrfs layer that re-implements
generic, robust, widely deployed, stable functionality?

What concerns me the most here is that it seems like that nothing
has been learnt from the btrfs RAID5/6 debacle. i.e. the btrfs
reimplementation of existing robust, stable, widely deployed
infrastructure was fatally flawed and despite regular corruption
reports they were ignored for, what, 2 years? And then a /user/
spent the time to isolate the problem, and now several months later
it still hasn't been fixed. I haven't seen any developer interest in
fixing it, either.

This meets the definition of unmaintained software, and it sets a
poor example for how complex new btrfs features might be maintained
in the long term. Encryption simply cannot be treated like this - it
has to be right, and it has to be well maintained.

So what is being done differently ito the RAID5/6 review process
this time that will make the new btrfs-specific encryption
implementation solid and have minimal risk of zero-day fatal flaws?
And how are you going to guarantee that it will be adequately
maintained several years down the track?

> Also yes, thanks for the emails, I hear, per file encryption and inline
> with vfs layer is also important, which is wip among other things in the
> list.

The generic file encryption code is solid, reviewed, tested and
already widely deployed via two separate filesystems. There is a
much wider pool of developers who will maintain it, reveiw changes
and know all the traps that a new implementation might fall into.
There's a much bigger safety net here, which significantly lowers
the risk of zero-day fatal flaws in a new implementation and of
flaws in future modifications and enhancements.

Hence, IMO, the first thing to do is implement and make the generic
file encryption support solid and robust, not tack it on as an
afterthought for the magic btrfs encryption pixies to take care of.

Indeed, with the generic file encryption, btrfs may not even need
the special subvolume encryption pixies. i.e. you can effectively
implement subvolume encryption via configuration of a multi-user
encryption key for each subvolume and apply it to the subvolume tree
root at creation time. Then only users with permission to unlock the
subvolume key can access it.

Once the generic file encryption is solid and fulfils the needs of
most users, then you can look to solving the less common threat
models that neither dmcrypt or per-file encryption address. Only if
the generic code cannot be expanded to address specific threat
models should you then implement something that is unique to
btrfs

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

df -i shows 0 inodes 0 used 0 free on 4.4.0-36-generic Ubuntu 14 - Bug or not?

2016-09-15 Thread GWB

I don't expect accurate data on a btrfs file system when using df, but
after upgrading to kernel 4.4.0 I get the following:

$ df -i
...
/dev/sdc3   0   0  0 - /home
/dev/sdc4   0   0  0 - /vm0
...

Where /dev/sdc3 and /dev/sdc4 are btrfs filesystems.

So is this a bug or not?  I ask because root (/dev/sdc3) began to
display the error message "no space left on device", which was
eventually cured by deleting old snapshots then btrfs fi sync and
btrfs balance.  fi show and fi df show space, even when df -i shows 0
inodes:

sudo btrfs fi show /
...
Label: none  uuid: 9acdb642-d743-4c2a-a59f-811022c2a2b0
Total devices 1 FS bytes used 23.86GiB
devid1 size 60.00GiB used 42.03GiB path /dev/sdc3


 sudo btrfs fi df /
Data, single: total=37.00GiB, used=21.11GiB
System, single: total=32.00MiB, used=16.00KiB
Metadata, single: total=5.00GiB, used=2.75GiB
unknown, single: total=512.00MiB, used=0.00

Please excuse my inexperience here; I don't know how to use btrfs
commands to show inodes.  btrfs inspect-internal will reference an
inode as near as I can tell, but won't list the used and free inodes
("free" may not be the correct word here, given btrfs architecture).

Many Thanks,

Gordon

Machine Specs:

% uname -a
Linux Bon-E 4.4.0-36-generic #55~14.04.1-Ubuntu SMP Fri Aug 12
11:49:30 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

% btrfs --version
Btrfs v3.12

% sudo btrfs fi show
Label: none  uuid: 9acdb642-d743-4c2a-a59f-811022c2a2b0
Total devices 1 FS bytes used 23.86GiB
devid1 size 60.00GiB used 42.03GiB path /dev/sdc3

%Label: vm0  uuid: 72539416-d30e-4a34-8b2d-b2369d1fb075
Total devices 1 FS bytes used 349.96GiB
devid1 size 374.66GiB used 374.66GiB path /dev/sdc4

dmesg does not appear to show anything useful for btrfs or device, but
mount shows:

% mount | grep btrfs
/dev/sdc3 on / type btrfs (rw,ssd,subvol=@)
/dev/sdc3 on /home type btrfs (rw,ssd,subvol=@home)
/dev/sdc4 on /vm0 type btrfs (rw,ssd,space_cache,compress=lzo,subvol=@vm0)
/dev/sdc3 on /mnt/btrfs-root type btrfs (rw)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: multi-device btrfs with single data mode and disk failure

2016-09-15 Thread Chris Murphy

On Thu, Sep 15, 2016 at 3:48 PM, Alexandre Poux  wrote:
>
> Le 15/09/2016 à 18:54, Chris Murphy a écrit :
>> On Thu, Sep 15, 2016 at 10:30 AM, Alexandre Poux  wrote:
>>> Thank you very much for your answers
>>>
>>> Le 15/09/2016 à 17:38, Chris Murphy a écrit :
 On Thu, Sep 15, 2016 at 1:44 AM, Alexandre Poux  wrote:
> Is it possible to do some king of a "btrfs delete missing" on this
> kind of setup, in order to recover access in rw to my other data, or
> I must copy all my data on a new partition
 That *should* work :) Except that your file system with 6 drives is
 too full to be shrunk to 5 drives. Btrfs will either refuse, or get
 confused, about how to shrink a nearly full 6 drive volume into 5.

 So you'll have to do one of three things:

 1. Add a 2+TB drive, then remove the missing one; OR
 2. btrfs replace is faster and is raid10 reliable; OR
 3. Read only scrub to get a file listing of bad files, then remount
 read-write degraded and delete them all. Now you maybe can do a device
 delete missing. But it's still a tight fit, it basically has to
 balance things out to get it to fit on an odd number of drives, it may
 actually not work even though there seems to be enough total space,
 there has to be enough space on FOUR drives.

>>> Are you sure you are talking about data in single mode ?
>>> I don't understand why you are talking about raid10,
>>> or the fact that it will have to rebalance everything.
>> Yeah sorry I got confused in that very last sentence. Single, it will
>> find space in 1GiB increments. Of course this fails because that data
>> doesn't exist anymore, but to start the operation it needs to be
>> possible.
> No problem
>>> Moreover, even in degraded mode I cannot mount it in rw
>>> It tells me
>>> "too many missing devices, writeable remount is not allowed"
>>> due to the fact I'm in single mode.
>> Oh you're in that trap. Well now you're stuck. I've had the case where
>> I could mount read write degraded with metadata raid1 and data single,
>> but it was good for only one mount and then I get the same message you
>> get and it was only possible to mount read only. At that point it's
>> totally suck unless you're adept at manipulating the file system with
>> a hex editor...
>>
>> Someone might have a patch somewhere that drops this check and lets
>> too many missing devices to mount anyway... I seem to recall this.
>> It'd be in the archives if it exists.
>>
>>
>>
>>> And as far as as know, btrfs replace and btrfs delete, are not supposed
>>> to work in read only...
>> It doesn't. Must be read write mounted.
>>
>>
>>> I would like to tell him forgot about the missing data, and give me back
>>> my partition.
>> This feature doesn't exist yet. I really want to see this, it'd be
>> great for ceph and gluster if the volume could lose a drive, report
>> all the missing files to the cluster file system, delete the device
>> and the file references, and then the cluster knows that brick doesn't
>> have those files and can replicate them somewhere else or even back to
>> the brick that had them.
>>
> So I found this patch : https://patchwork.kernel.org/patch/7014141/
>
> Does this seems ok ?

No idea I haven't tried it.

>
> So after patching my kernel with it,
> I should be able to mount in rw my partition, and thus,
> I will be able to do a btrfs delete missing
> Which will just forgot about the old disk and everything should be fine
> afterward ?

It will forget about the old disk but it will try to migrate all
metadata and data that was on that disk to the remaining drives; so
until you delete all files that are corrupt, you'll continue to get
corruption messages about them.

>
> Is this risky ? or not so much ?

Probably. If you care about the data, mount read only, back up what
you can, then see if you can fix it after that.

> The scrubing is almost finished, and as I was expecting, I lost no data
> at all.

Well I'd guess the device delete should work then, but I still have no
idea if that patch will let you mount it degraded read-write. Worth a
shot though, it'll save time.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is stability a joke? (wiki updated)

2016-09-15 Thread Christoph Anton Mitterer

On Thu, 2016-09-15 at 14:20 -0400, Austin S. Hemmelgarn wrote:
> 3. Fsck should be needed only for un-mountable filesystems.  Ideally,
> we 
> should be handling things like Windows does.  Preform slightly
> better 
> checking when reading data, and if we see an error, flag the
> filesystem 
> for expensive repair on the next mount.

That philosophy also has some drawbacks:
- The user doesn't directly that anything went wrong. Thus errors may
even continue to accumulate and getting much worse if the fs would have
immediately gone ro and giving the user the chance to manually
intervene (possibly then with help from upstream).

- Any smart auto-magical™ repair may also just fail (and make things
worse, as the current --repair e.g. may). Not performing such auto-
repair, gives the user at least the possible chance to make a bitwise
copy of the whole fs, before trying any rescue operations.
This wouldn't be the case, if the user never noticed that something
happen, and the fs tries to repair things right at mounting.

So I think any such auto-repair should be used with extreme caution and
only in those cases where one is absolutely a 100% sure that the action
will help and just do good.



Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature

Re: Size of scrubbed Data

2016-09-15 Thread Chris Murphy

On Thu, Sep 15, 2016 at 9:48 AM, Stefan Malte Schumacher
 wrote:

>
> btrfs --version
> btrfs-progs v4.7.1

Upgrade to 4.7.2 or downgrade to 4.6.1 before using btrfs check; see
the changelog for details. I'm not recommending that you use btrfs
check, just staying this version of tools is not reliable for some
file systems.

>  btrfs fi df /mnt/btrfs-raid/
> Data, RAID1: total=6.17TiB, used=6.05TiB
> System, RAID1: total=32.00MiB, used=916.00KiB
> System, single: total=4.00MiB, used=0.00B
> Metadata, RAID1: total=10.00GiB, used=8.14GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B

The 3rd line is possibly rather dangerous in that it might mean
there's some tiny amount of system data that has only one copy on one
drive. And since it's a system chunk, if it's true that there's only
one copy, if it were damaged or lost, it'd take out the whole volume.

btrfs balance start -mconvert=raid1,soft 

See what that gets you, and then recheck with btrfs fi df or better
use btrfs fi us.

Unfortunately I don't have an answer for the original question.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Size of scrubbed Data

2016-09-15 Thread Stefan Malte Schumacher

Hi sash

How do I move the single system data to raid1? Dmesg doesnt show any
scrubbing errors, according to Smart all the disks are okay. I am not
using any any compression. How would I change freespacecache to v2 and
what benefits would it entail?

I think I need to add the following to my original question: How is
"total bytes scrubbed" calculated? On a Raid1, shouldnt it be exactly
two times the actual data size on the disk?

Yours
Stefan

2016-09-15 22:11 GMT+02:00  :
> Hi Stefan,
>
> 1st you should run an balance on system data to move the single data to
> raid1. imho.
> then do the scrub again.
> btw are there any scrubbing errors in dmesg? disks are ok?! any
> compression involved? changed freespacecache to v2?
>
>
> sash
>
>
>
> Am 15.09.2016 um 17:48 schrieb Stefan Malte Schumacher:
>> Hello
>>
>>
>> I have encountered a very strange phenomenon while using btrfs-scrub.
>> I believe it may be a result of replacing my old installation of
>> Debian Jessie with Debian Stretch, resulting in a Kernel Switch from
>> 3.16+63 to 4.6.0-1. I scrub my filesystem once a month and let anacron
>> send me the results. My filesystem, consisting of four four-gigabyte
>> drives with both data and metadata as RAID1 was reported as containing
>> nearly 12TiB of data in scrubs done in May, June, July and August. But
>> then it changed and suddenly shows only 9TiB in size, despite the fact
>> that I did not delete any large files. If I remember correctly my
>> switch from Debian Jessie to Stretch was around that time period.
>> Could someone explain this behavior to me? Was a new way of
>> calculating the size of scrubed data introduced? How can I check if I
>> have lost data? I have a backup, but only one generation and rsync
>> will by now have deleted files on the NAS, which might have lost on
>> the fileserver. According to the long and short self-tests, which I
>> run with smartmontools my drives are alright. How do I proceed?
>>
>>
>>
>> Yours
>>
>> Stefan
>>
>> uname -a
>> Linux mars 4.6.0-1-amd64 #1 SMP Debian 4.6.4-1 (2016-07-18) x86_64 GNU/Linux
>>
>> btrfs --version
>> btrfs-progs v4.7.1
>>
>>  btrfs fi show
>> Label: none  uuid: 8c668854-db5d-45a7-875d-43c4e82a829e
>> Total devices 4 FS bytes used 6.06TiB
>> devid1 size 3.64TiB used 3.09TiB path /dev/sde
>> devid2 size 3.64TiB used 3.09TiB path /dev/sdc
>> devid3 size 3.64TiB used 3.09TiB path /dev/sdd
>> devid4 size 3.64TiB used 3.09TiB path /dev/sda
>>
>>
>>  btrfs fi df /mnt/btrfs-raid/
>> Data, RAID1: total=6.17TiB, used=6.05TiB
>> System, RAID1: total=32.00MiB, used=916.00KiB
>> System, single: total=4.00MiB, used=0.00B
>> Metadata, RAID1: total=10.00GiB, used=8.14GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
>>
>> Maybe this is also of use in identifying the problem:
>> grep btrfs *
>> grep: apt: Ist ein Verzeichnis
>> grep: cups: Ist ein Verzeichnis
>> dpkg.log:2016-09-03 15:20:16 upgrade btrfs-progs:amd64 4.7-1 4.7.1-1
>> dpkg.log:2016-09-03 15:20:16 status triggers-awaited btrfs-progs:amd64 4.7-1
>> dpkg.log:2016-09-03 15:20:16 status half-configured btrfs-progs:amd64 4.7-1
>> dpkg.log:2016-09-03 15:20:16 status unpacked btrfs-progs:amd64 4.7-1
>> dpkg.log:2016-09-03 15:20:16 status half-installed btrfs-progs:amd64 4.7-1
>> dpkg.log:2016-09-03 15:20:16 status half-installed btrfs-progs:amd64 4.7-1
>> dpkg.log:2016-09-03 15:20:17 status unpacked btrfs-progs:amd64 4.7.1-1
>> dpkg.log:2016-09-03 15:20:17 status unpacked btrfs-progs:amd64 4.7.1-1
>> dpkg.log:2016-09-03 15:20:45 configure btrfs-progs:amd64 4.7.1-1 
>> dpkg.log:2016-09-03 15:20:45 status unpacked btrfs-progs:amd64 4.7.1-1
>> dpkg.log:2016-09-03 15:20:45 status unpacked btrfs-progs:amd64 4.7.1-1
>> dpkg.log:2016-09-03 15:20:45 status half-configured btrfs-progs:amd64 4.7.1-1
>> dpkg.log:2016-09-03 15:20:46 status triggers-awaited btrfs-progs:amd64 
>> 4.7.1-1
>> dpkg.log:2016-09-03 15:20:51 status installed btrfs-progs:amd64 4.7.1-1
>> dpkg.log.1:2016-08-10 16:58:23 upgrade btrfs-progs:amd64 4.5.2-1 4.6.1-1
>> dpkg.log.1:2016-08-10 16:58:23 status triggers-awaited btrfs-progs:amd64 
>> 4.5.2-1
>> dpkg.log.1:2016-08-10 16:58:23 status half-configured btrfs-progs:amd64 
>> 4.5.2-1
>> dpkg.log.1:2016-08-10 16:58:23 status unpacked btrfs-progs:amd64 4.5.2-1
>> dpkg.log.1:2016-08-10 16:58:23 status half-installed btrfs-progs:amd64 
>> 4.5.2-1
>> dpkg.log.1:2016-08-10 16:58:24 status half-installed btrfs-progs:amd64 
>> 4.5.2-1
>> dpkg.log.1:2016-08-10 16:58:24 status unpacked btrfs-progs:amd64 4.6.1-1
>> dpkg.log.1:2016-08-10 16:58:24 status unpacked btrfs-progs:amd64 4.6.1-1
>> dpkg.log.1:2016-08-10 17:01:25 configure btrfs-progs:amd64 4.6.1-1 
>> dpkg.log.1:2016-08-10 17:01:25 status unpacked btrfs-progs:amd64 4.6.1-1
>> dpkg.log.1:2016-08-10 17:01:26 status unpacked btrfs-progs:amd64 4.6.1-1
>> dpkg.log.1:2016-08-10 17:01:26 status half-configured btrfs-progs:amd64 
>> 4.6.1-1
>> dpkg.log.1:2016-08-10

Re: Size of scrubbed Data

2016-09-15 Thread g6094199

Hi Stefan,

1st you should run an balance on system data to move the single data to
raid1. imho.
then do the scrub again.
btw are there any scrubbing errors in dmesg? disks are ok?! any
compression involved? changed freespacecache to v2?


sash



Am 15.09.2016 um 17:48 schrieb Stefan Malte Schumacher:
> Hello
>
>
> I have encountered a very strange phenomenon while using btrfs-scrub.
> I believe it may be a result of replacing my old installation of
> Debian Jessie with Debian Stretch, resulting in a Kernel Switch from
> 3.16+63 to 4.6.0-1. I scrub my filesystem once a month and let anacron
> send me the results. My filesystem, consisting of four four-gigabyte
> drives with both data and metadata as RAID1 was reported as containing
> nearly 12TiB of data in scrubs done in May, June, July and August. But
> then it changed and suddenly shows only 9TiB in size, despite the fact
> that I did not delete any large files. If I remember correctly my
> switch from Debian Jessie to Stretch was around that time period.
> Could someone explain this behavior to me? Was a new way of
> calculating the size of scrubed data introduced? How can I check if I
> have lost data? I have a backup, but only one generation and rsync
> will by now have deleted files on the NAS, which might have lost on
> the fileserver. According to the long and short self-tests, which I
> run with smartmontools my drives are alright. How do I proceed?
>
>
>
> Yours
>
> Stefan
>
> uname -a
> Linux mars 4.6.0-1-amd64 #1 SMP Debian 4.6.4-1 (2016-07-18) x86_64 GNU/Linux
>
> btrfs --version
> btrfs-progs v4.7.1
>
>  btrfs fi show
> Label: none  uuid: 8c668854-db5d-45a7-875d-43c4e82a829e
> Total devices 4 FS bytes used 6.06TiB
> devid1 size 3.64TiB used 3.09TiB path /dev/sde
> devid2 size 3.64TiB used 3.09TiB path /dev/sdc
> devid3 size 3.64TiB used 3.09TiB path /dev/sdd
> devid4 size 3.64TiB used 3.09TiB path /dev/sda
>
>
>  btrfs fi df /mnt/btrfs-raid/
> Data, RAID1: total=6.17TiB, used=6.05TiB
> System, RAID1: total=32.00MiB, used=916.00KiB
> System, single: total=4.00MiB, used=0.00B
> Metadata, RAID1: total=10.00GiB, used=8.14GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
> Maybe this is also of use in identifying the problem:
> grep btrfs *
> grep: apt: Ist ein Verzeichnis
> grep: cups: Ist ein Verzeichnis
> dpkg.log:2016-09-03 15:20:16 upgrade btrfs-progs:amd64 4.7-1 4.7.1-1
> dpkg.log:2016-09-03 15:20:16 status triggers-awaited btrfs-progs:amd64 4.7-1
> dpkg.log:2016-09-03 15:20:16 status half-configured btrfs-progs:amd64 4.7-1
> dpkg.log:2016-09-03 15:20:16 status unpacked btrfs-progs:amd64 4.7-1
> dpkg.log:2016-09-03 15:20:16 status half-installed btrfs-progs:amd64 4.7-1
> dpkg.log:2016-09-03 15:20:16 status half-installed btrfs-progs:amd64 4.7-1
> dpkg.log:2016-09-03 15:20:17 status unpacked btrfs-progs:amd64 4.7.1-1
> dpkg.log:2016-09-03 15:20:17 status unpacked btrfs-progs:amd64 4.7.1-1
> dpkg.log:2016-09-03 15:20:45 configure btrfs-progs:amd64 4.7.1-1 
> dpkg.log:2016-09-03 15:20:45 status unpacked btrfs-progs:amd64 4.7.1-1
> dpkg.log:2016-09-03 15:20:45 status unpacked btrfs-progs:amd64 4.7.1-1
> dpkg.log:2016-09-03 15:20:45 status half-configured btrfs-progs:amd64 4.7.1-1
> dpkg.log:2016-09-03 15:20:46 status triggers-awaited btrfs-progs:amd64 4.7.1-1
> dpkg.log:2016-09-03 15:20:51 status installed btrfs-progs:amd64 4.7.1-1
> dpkg.log.1:2016-08-10 16:58:23 upgrade btrfs-progs:amd64 4.5.2-1 4.6.1-1
> dpkg.log.1:2016-08-10 16:58:23 status triggers-awaited btrfs-progs:amd64 
> 4.5.2-1
> dpkg.log.1:2016-08-10 16:58:23 status half-configured btrfs-progs:amd64 
> 4.5.2-1
> dpkg.log.1:2016-08-10 16:58:23 status unpacked btrfs-progs:amd64 4.5.2-1
> dpkg.log.1:2016-08-10 16:58:23 status half-installed btrfs-progs:amd64 4.5.2-1
> dpkg.log.1:2016-08-10 16:58:24 status half-installed btrfs-progs:amd64 4.5.2-1
> dpkg.log.1:2016-08-10 16:58:24 status unpacked btrfs-progs:amd64 4.6.1-1
> dpkg.log.1:2016-08-10 16:58:24 status unpacked btrfs-progs:amd64 4.6.1-1
> dpkg.log.1:2016-08-10 17:01:25 configure btrfs-progs:amd64 4.6.1-1 
> dpkg.log.1:2016-08-10 17:01:25 status unpacked btrfs-progs:amd64 4.6.1-1
> dpkg.log.1:2016-08-10 17:01:26 status unpacked btrfs-progs:amd64 4.6.1-1
> dpkg.log.1:2016-08-10 17:01:26 status half-configured btrfs-progs:amd64 
> 4.6.1-1
> dpkg.log.1:2016-08-10 17:01:26 status triggers-awaited btrfs-progs:amd64 
> 4.6.1-1
> dpkg.log.1:2016-08-10 17:02:34 status installed btrfs-progs:amd64 4.6.1-1
> dpkg.log.1:2016-08-19 00:45:05 upgrade btrfs-progs:amd64 4.6.1-1 4.7-1
> dpkg.log.1:2016-08-19 00:45:05 status triggers-awaited btrfs-progs:amd64 
> 4.6.1-1
> dpkg.log.1:2016-08-19 00:45:05 status half-configured btrfs-progs:amd64 
> 4.6.1-1
> dpkg.log.1:2016-08-19 00:45:05 status unpacked btrfs-progs:amd64 4.6.1-1
> dpkg.log.1:2016-08-19 00:45:05 status half-installed btrfs-progs:amd64 4.6.1-1
> dpkg.log.1:2016-08-19 00:45:06 status half-installed btrfs-progs:amd64

Re: Is stability a joke? (wiki updated)

2016-09-15 Thread Chris Murphy

On Thu, Sep 15, 2016 at 2:16 PM, Hugo Mills  wrote:
> On Thu, Sep 15, 2016 at 01:02:43PM -0600, Chris Murphy wrote:
>> On Thu, Sep 15, 2016 at 12:20 PM, Austin S. Hemmelgarn
>>  wrote:
>>
>> > 2. We're developing new features without making sure that check can fix
>> > issues in any associated metadata.  Part of merging a new feature needs to
>> > be proving that fsck can handle fixing any issues in the metadata for that
>> > feature short of total data loss or complete corruption.
>> >
>> > 3. Fsck should be needed only for un-mountable filesystems.  Ideally, we
>> > should be handling things like Windows does.  Preform slightly better
>> > checking when reading data, and if we see an error, flag the filesystem for
>> > expensive repair on the next mount.
>>
>> Right, well I'm vaguely curious why ZFS, as different as it is,
>> basically take the position that if the hardware went so batshit that
>> they can't unwind it on a normal mount, then an fsck probably can't
>> help either... they still don't have an fsck and don't appear to want
>> one.
>>
>> I'm not sure if the brfsck is really all that helpful to user as much
>> as it is for developers to better learn about the failure vectors of
>> the file system.
>>
>>
>> > 4. Btrfs check should know itself if it can fix something or not, and that
>> > should be reported.  I have an otherwise perfectly fine filesystem that
>> > throws some (apparently harmless) errors in check, and check can't repair
>> > them.  Despite this, it gives zero indication that it can't repair them,
>> > zero indication that it didn't repair them, and doesn't even seem to give a
>> > non-zero exit status for this filesystem.
>>
>> Yeah, it's really not a user tool in my view...
>>
>>
>>
>> >
>> > As far as the other tools:
>> > - Self-repair at mount time: This isn't a repair tool, if the FS mounts,
>> > it's not broken, it's just a messy and the kernel is tidying things up.
>> > - btrfsck/btrfs check: I think I covered the issues here well.
>> > - Mount options: These are mostly just for expensive checks during mount,
>> > and most people should never need them except in very unusual 
>> > circumstances.
>> > - btrfs rescue *: These are all fixes for very specific issues.  They 
>> > should
>> > be folded into check with special aliases, and not be separate tools.  The
>> > first fixes an issue that's pretty much non-existent in any modern kernel,
>> > and the other two are for very low-level data recovery of horribly broken
>> > filesystems.
>> > - scrub: This is a very purpose specific tool which is supposed to be part
>> > of regular maintainence, and only works to fix things as a side effect of
>> > what it does.
>> > - balance: This is also a relatively purpose specific tool, and again only
>> > fixes things as a side effect of what it does.
>
>You've forgotten btrfs-zero-log, which seems to have built itself a
> reputation on the internet as the tool you run to fix all btrfs ills,
> rather than a very finely-targeted tool that was introduced to deal
> with approximately one bug somewhere back in the 2.x era (IIRC).
>
>Hugo.

:-) It's in my original list, and it's in Austin's by way of being
lumped into 'btrfs rescue *' along with chunk and super recover. Seems
like super recover should be built into Btrfs check, and would be one
of the first ambiguities to get out of the way but I'm just an ape
that wears pants so what do I know.

Thing is?? zero log has fixed file systems in cases where I never
would have expected it to, and the user was recommended not to use it,
or use it as a 2nd to last resort. So, pfffIt's like throwing salt
around.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is stability a joke? (wiki updated)

2016-09-15 Thread Hugo Mills

On Thu, Sep 15, 2016 at 01:02:43PM -0600, Chris Murphy wrote:
> On Thu, Sep 15, 2016 at 12:20 PM, Austin S. Hemmelgarn
>  wrote:
> 
> > 2. We're developing new features without making sure that check can fix
> > issues in any associated metadata.  Part of merging a new feature needs to
> > be proving that fsck can handle fixing any issues in the metadata for that
> > feature short of total data loss or complete corruption.
> >
> > 3. Fsck should be needed only for un-mountable filesystems.  Ideally, we
> > should be handling things like Windows does.  Preform slightly better
> > checking when reading data, and if we see an error, flag the filesystem for
> > expensive repair on the next mount.
> 
> Right, well I'm vaguely curious why ZFS, as different as it is,
> basically take the position that if the hardware went so batshit that
> they can't unwind it on a normal mount, then an fsck probably can't
> help either... they still don't have an fsck and don't appear to want
> one.
> 
> I'm not sure if the brfsck is really all that helpful to user as much
> as it is for developers to better learn about the failure vectors of
> the file system.
> 
> 
> > 4. Btrfs check should know itself if it can fix something or not, and that
> > should be reported.  I have an otherwise perfectly fine filesystem that
> > throws some (apparently harmless) errors in check, and check can't repair
> > them.  Despite this, it gives zero indication that it can't repair them,
> > zero indication that it didn't repair them, and doesn't even seem to give a
> > non-zero exit status for this filesystem.
> 
> Yeah, it's really not a user tool in my view...
> 
> 
> 
> >
> > As far as the other tools:
> > - Self-repair at mount time: This isn't a repair tool, if the FS mounts,
> > it's not broken, it's just a messy and the kernel is tidying things up.
> > - btrfsck/btrfs check: I think I covered the issues here well.
> > - Mount options: These are mostly just for expensive checks during mount,
> > and most people should never need them except in very unusual circumstances.
> > - btrfs rescue *: These are all fixes for very specific issues.  They should
> > be folded into check with special aliases, and not be separate tools.  The
> > first fixes an issue that's pretty much non-existent in any modern kernel,
> > and the other two are for very low-level data recovery of horribly broken
> > filesystems.
> > - scrub: This is a very purpose specific tool which is supposed to be part
> > of regular maintainence, and only works to fix things as a side effect of
> > what it does.
> > - balance: This is also a relatively purpose specific tool, and again only
> > fixes things as a side effect of what it does.

   You've forgotten btrfs-zero-log, which seems to have built itself a
reputation on the internet as the tool you run to fix all btrfs ills,
rather than a very finely-targeted tool that was introduced to deal
with approximately one bug somewhere back in the 2.x era (IIRC).

   Hugo.

> 
> Yeah I know, it's just much of this is non-obvious to users unfamiliar
> with this file system. And even I'm often throwing spaghetti on a
> wall.
> 
> 
> -- 
> Chris Murphy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Hugo Mills | It's against my programming to impersonate a deity!
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4  |  C3PO, Return of the Jedi


signature.asc
Description: Digital signature

Re: unable to handle kernel paging request

2016-09-15 Thread Mark Gavalda

Thanks, I can see it included in 4.8-rc6 but not the other branches.
Will it get pulled later or is this a 4.8 only fix?

Mark

On Thu, Sep 15, 2016 at 6:05 PM, Chris Mason  wrote:
> On 09/15/2016 10:08 AM, Mark Gavalda wrote:
>>
>> Hi,
>>
>> Bumped into the following one today; kernel 4.4.0-36-generic Ubuntu
>> 16.4.1; CPU went to 100% and only a hard restart solved the issue.
>> Since then everything's back to normal.
>>
>> Please let me know how can I help get to the bottom of this?
>
>
> I saw similar traces when tracking down this bug:
>
> https://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?h=for-linus-4.8=cbd60aa7cd17d81a434234268c55192862147439
>
> It's flagged for stable, so you'll get it with the next stable update, or
> you can apply it by hand and rebuild.
>
> -chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Expose verbose flag on subvolume delete

2016-09-15 Thread Vincent Batts

Exposing the verbose flag that already had the logic for verbose output

Vincent Batts (1):
  btrfs-progs: subvolume verbose delete flag

 cmds-subvolume.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs-progs: subvolume verbose delete flag

2016-09-15 Thread Vincent Batts

There was already the logic for verbose output, but the flag parsing did
not include it.

Signed-off-by: Vincent Batts 
---
 cmds-subvolume.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/cmds-subvolume.c b/cmds-subvolume.c
index e7ef67d..f8c9f48 100644
--- a/cmds-subvolume.c
+++ b/cmds-subvolume.c
@@ -245,6 +245,7 @@ static const char * const cmd_subvol_delete_usage[] = {
"",
"-c|--commit-after  wait for transaction commit at the end of the 
operation",
"-C|--commit-each   wait for transaction commit after deleting each 
subvolume",
+   "-v|--verbose   verbose output of operations",
NULL
 };
 
@@ -267,10 +268,11 @@ static int cmd_subvol_delete(int argc, char **argv)
static const struct option long_options[] = {
{"commit-after", no_argument, NULL, 'c'},  /* commit 
mode 1 */
{"commit-each", no_argument, NULL, 'C'},  /* commit 
mode 2 */
+   {"verbose", no_argument, NULL, 'v'},
{NULL, 0, NULL, 0}
};
 
-   c = getopt_long(argc, argv, "cC", long_options, NULL);
+   c = getopt_long(argc, argv, "cCv", long_options, NULL);
if (c < 0)
break;
 
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is stability a joke? (wiki updated)

2016-09-15 Thread Chris Murphy

On Thu, Sep 15, 2016 at 12:20 PM, Austin S. Hemmelgarn
 wrote:

> 2. We're developing new features without making sure that check can fix
> issues in any associated metadata.  Part of merging a new feature needs to
> be proving that fsck can handle fixing any issues in the metadata for that
> feature short of total data loss or complete corruption.
>
> 3. Fsck should be needed only for un-mountable filesystems.  Ideally, we
> should be handling things like Windows does.  Preform slightly better
> checking when reading data, and if we see an error, flag the filesystem for
> expensive repair on the next mount.

Right, well I'm vaguely curious why ZFS, as different as it is,
basically take the position that if the hardware went so batshit that
they can't unwind it on a normal mount, then an fsck probably can't
help either... they still don't have an fsck and don't appear to want
one.

I'm not sure if the brfsck is really all that helpful to user as much
as it is for developers to better learn about the failure vectors of
the file system.


> 4. Btrfs check should know itself if it can fix something or not, and that
> should be reported.  I have an otherwise perfectly fine filesystem that
> throws some (apparently harmless) errors in check, and check can't repair
> them.  Despite this, it gives zero indication that it can't repair them,
> zero indication that it didn't repair them, and doesn't even seem to give a
> non-zero exit status for this filesystem.

Yeah, it's really not a user tool in my view...



>
> As far as the other tools:
> - Self-repair at mount time: This isn't a repair tool, if the FS mounts,
> it's not broken, it's just a messy and the kernel is tidying things up.
> - btrfsck/btrfs check: I think I covered the issues here well.
> - Mount options: These are mostly just for expensive checks during mount,
> and most people should never need them except in very unusual circumstances.
> - btrfs rescue *: These are all fixes for very specific issues.  They should
> be folded into check with special aliases, and not be separate tools.  The
> first fixes an issue that's pretty much non-existent in any modern kernel,
> and the other two are for very low-level data recovery of horribly broken
> filesystems.
> - scrub: This is a very purpose specific tool which is supposed to be part
> of regular maintainence, and only works to fix things as a side effect of
> what it does.
> - balance: This is also a relatively purpose specific tool, and again only
> fixes things as a side effect of what it does.
>

Yeah I know, it's just much of this is non-obvious to users unfamiliar
with this file system. And even I'm often throwing spaghetti on a
wall.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: kill BUG_ON in do_relocation

2016-09-15 Thread Chris Mason




On 09/15/2016 03:01 PM, Liu Bo wrote:

On Wed, Sep 14, 2016 at 11:19:04AM -0700, Liu Bo wrote:

On Wed, Sep 14, 2016 at 01:31:31PM -0400, Josef Bacik wrote:

On 09/14/2016 01:29 PM, Chris Mason wrote:



On 09/14/2016 01:13 PM, Josef Bacik wrote:

On 09/14/2016 12:27 PM, Liu Bo wrote:

While updating btree, we try to push items between sibling
nodes/leaves in order to keep height as low as possible.
But we don't memset the original places with zero when
pushing items so that we could end up leaving stale content
in nodes/leaves.  One may read the above stale content by
increasing btree blocks' @nritems.



Ok this sounds really bad.  Is this as bad as I think it sounds?  We
should probably fix this like right now right?


He's bumping @nritems with a fuzzer I think?  As in this happens when someone
forces it (or via some other bug) but not in normal operations.



Oh ok if this happens with a fuzzer than this is fine, but I'd rather do
-EIO so we know this is something bad with the fs.


-EIO may be more appropriate to be given while reading btree blocks and
checking their validation?


Looks like EIO doesn't fit into this case, either, do we have any errno
representing 'corrupted filesystem'?


That's EIO.  Sometimes the EIO is big enough we have to abort, but 
really the abort is just adding bonus.


-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: handle quota reserve failure properly

2016-09-15 Thread Josef Bacik

btrfs/022 was spitting a warning for the case that we exceed the quota.  If we
fail to make our quota reservation we need to clean up our data space
reservation.  Thanks,

Signed-off-by: Josef Bacik 
---
 fs/btrfs/extent-tree.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 03da2f6..d72eaae 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4286,13 +4286,10 @@ int btrfs_check_data_free_space(struct inode *inode, 
u64 start, u64 len)
if (ret < 0)
return ret;
 
-   /*
-* Use new btrfs_qgroup_reserve_data to reserve precious data space
-*
-* TODO: Find a good method to avoid reserve data space for NOCOW
-* range, but don't impact performance on quota disable case.
-*/
+   /* Use new btrfs_qgroup_reserve_data to reserve precious data space. */
ret = btrfs_qgroup_reserve_data(inode, start, len);
+   if (ret)
+   btrfs_free_reserved_data_space_noquota(inode, start, len);
return ret;
 }
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: kill BUG_ON in do_relocation

2016-09-15 Thread Liu Bo

On Wed, Sep 14, 2016 at 11:19:04AM -0700, Liu Bo wrote:
> On Wed, Sep 14, 2016 at 01:31:31PM -0400, Josef Bacik wrote:
> > On 09/14/2016 01:29 PM, Chris Mason wrote:
> > > 
> > > 
> > > On 09/14/2016 01:13 PM, Josef Bacik wrote:
> > > > On 09/14/2016 12:27 PM, Liu Bo wrote:
> > > > > While updating btree, we try to push items between sibling
> > > > > nodes/leaves in order to keep height as low as possible.
> > > > > But we don't memset the original places with zero when
> > > > > pushing items so that we could end up leaving stale content
> > > > > in nodes/leaves.  One may read the above stale content by
> > > > > increasing btree blocks' @nritems.
> > > > > 
> > > > 
> > > > Ok this sounds really bad.  Is this as bad as I think it sounds?  We
> > > > should probably fix this like right now right?
> > > 
> > > He's bumping @nritems with a fuzzer I think?  As in this happens when 
> > > someone
> > > forces it (or via some other bug) but not in normal operations.
> > > 
> > 
> > Oh ok if this happens with a fuzzer than this is fine, but I'd rather do
> > -EIO so we know this is something bad with the fs. 
> 
> -EIO may be more appropriate to be given while reading btree blocks and
> checking their validation?

Looks like EIO doesn't fit into this case, either, do we have any errno
representing 'corrupted filesystem'?

Thanks,

-liubo

> 
> > And change the changelog
> > to make it explicit that this is the result of fs corruption, not normal
> > operation.  Then you can add
> > 
> > Reviewed-by: Josef Bacik 
> 
> OK, make sense.
> 
> Thanks,
> 
> -liubo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is stability a joke? (wiki updated)

2016-09-15 Thread Austin S. Hemmelgarn


On 2016-09-15 14:01, Chris Murphy wrote:

On Tue, Sep 13, 2016 at 5:35 AM, Austin S. Hemmelgarn
 wrote:

On 2016-09-12 16:08, Chris Murphy wrote:


- btrfsck status
e.g. btrfs-progs 4.7.2 still warns against using --repair, and lists
it under dangerous options also;  while that's true, Btrfs can't be
considered stable or recommended by default
e.g. There's still way too many separate repair tools for Btrfs.
Depending on how you count there's at least 4, and more realistically
8 ways, scattered across multiple commands. This excludes btrfs
check's -E, -r, and -s flags. And it ignores sequence in the success
rate. The permutations are just excessive. It's definitely not easy to
know how to fix a Btrfs volume should things go wrong.


I assume you're counting balance and scrub in that, plus check gives 3, what
are you considering the 4th?


- Self repair at mount time, similar to other fs's with a journal
- fsck, similar to other fs's except the output is really unclear
about what the prognosis is compared to ext4 or xfs
- mount option usebackuproot/recovery
- btrfs rescue zero-log
- btrfs rescue super-recover
- btrfs rescue chunk-recover
- scrub
- balance

check --repair really needed to be fail safe a long time ago, it's
what everyone's come to expect from fsck's, that they don't make
things worse; and in particular on Btrfs it seems like its repairs
should be reversible but the reality is the man page says do not use
(except under advisement) and that it's dangerous (twice). And a user
got a broken system in the bug that affects 4.7, 4.7.1, that 4.7.2
apparently can't fix. So... life is hard, file systems are hard. But
it's also hard to see how distros can possibly feel comfortable with
Btrfs by default when the fsck tool is dangerous, even if in theory it
shouldn't often be necessary.


For check specifically, I see four issues:
1. It spits out pretty low-level information about the internals in many 
cases when it returns an error.  xfs_repair does this too, but it's 
needed even less frequently than btrfs check, and it at least uses 
relatively simple jargon by comparison.  I've been using BTRFS for years 
and still can't tell what more than half the error messages check can 
return mean.  In contrast to that, deciphering an error message from 
e2fsck is pretty trivial if you have some basic understanding of VFS 
level filesystem abstractions (stuff like what inodes and dentries are), 
and I never needed to learn low level things about the internals of ext4 
to parse the fsck output (I did anyway, but that's beside the point).


2. We're developing new features without making sure that check can fix 
issues in any associated metadata.  Part of merging a new feature needs 
to be proving that fsck can handle fixing any issues in the metadata for 
that feature short of total data loss or complete corruption.


3. Fsck should be needed only for un-mountable filesystems.  Ideally, we 
should be handling things like Windows does.  Preform slightly better 
checking when reading data, and if we see an error, flag the filesystem 
for expensive repair on the next mount.


4. Btrfs check should know itself if it can fix something or not, and 
that should be reported.  I have an otherwise perfectly fine filesystem 
that throws some (apparently harmless) errors in check, and check can't 
repair them.  Despite this, it gives zero indication that it can't 
repair them, zero indication that it didn't repair them, and doesn't 
even seem to give a non-zero exit status for this filesystem.


As far as the other tools:
- Self-repair at mount time: This isn't a repair tool, if the FS mounts, 
it's not broken, it's just a messy and the kernel is tidying things up.

- btrfsck/btrfs check: I think I covered the issues here well.
- Mount options: These are mostly just for expensive checks during 
mount, and most people should never need them except in very unusual 
circumstances.
- btrfs rescue *: These are all fixes for very specific issues.  They 
should be folded into check with special aliases, and not be separate 
tools.  The first fixes an issue that's pretty much non-existent in any 
modern kernel, and the other two are for very low-level data recovery of 
horribly broken filesystems.
- scrub: This is a very purpose specific tool which is supposed to be 
part of regular maintainence, and only works to fix things as a side 
effect of what it does.
- balance: This is also a relatively purpose specific tool, and again 
only fixes things as a side effect of what it does.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is stability a joke? (wiki updated)

2016-09-15 Thread Chris Murphy

On Tue, Sep 13, 2016 at 5:35 AM, Austin S. Hemmelgarn
 wrote:
> On 2016-09-12 16:08, Chris Murphy wrote:
>>
>> - btrfsck status
>> e.g. btrfs-progs 4.7.2 still warns against using --repair, and lists
>> it under dangerous options also;  while that's true, Btrfs can't be
>> considered stable or recommended by default
>> e.g. There's still way too many separate repair tools for Btrfs.
>> Depending on how you count there's at least 4, and more realistically
>> 8 ways, scattered across multiple commands. This excludes btrfs
>> check's -E, -r, and -s flags. And it ignores sequence in the success
>> rate. The permutations are just excessive. It's definitely not easy to
>> know how to fix a Btrfs volume should things go wrong.
>
> I assume you're counting balance and scrub in that, plus check gives 3, what
> are you considering the 4th?

- Self repair at mount time, similar to other fs's with a journal
- fsck, similar to other fs's except the output is really unclear
about what the prognosis is compared to ext4 or xfs
- mount option usebackuproot/recovery
- btrfs rescue zero-log
- btrfs rescue super-recover
- btrfs rescue chunk-recover
- scrub
- balance

check --repair really needed to be fail safe a long time ago, it's
what everyone's come to expect from fsck's, that they don't make
things worse; and in particular on Btrfs it seems like its repairs
should be reversible but the reality is the man page says do not use
(except under advisement) and that it's dangerous (twice). And a user
got a broken system in the bug that affects 4.7, 4.7.1, that 4.7.2
apparently can't fix. So... life is hard, file systems are hard. But
it's also hard to see how distros can possibly feel comfortable with
Btrfs by default when the fsck tool is dangerous, even if in theory it
shouldn't often be necessary.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: multi-device btrfs with single data mode and disk failure

2016-09-15 Thread Chris Murphy

On Thu, Sep 15, 2016 at 10:30 AM, Alexandre Poux  wrote:
> Thank you very much for your answers
>
> Le 15/09/2016 à 17:38, Chris Murphy a écrit :
>> On Thu, Sep 15, 2016 at 1:44 AM, Alexandre Poux  wrote:
>>> Is it possible to do some king of a "btrfs delete missing" on this
>>> kind of setup, in order to recover access in rw to my other data, or
>>> I must copy all my data on a new partition
>> That *should* work :) Except that your file system with 6 drives is
>> too full to be shrunk to 5 drives. Btrfs will either refuse, or get
>> confused, about how to shrink a nearly full 6 drive volume into 5.
>>
>> So you'll have to do one of three things:
>>
>> 1. Add a 2+TB drive, then remove the missing one; OR
>> 2. btrfs replace is faster and is raid10 reliable; OR
>> 3. Read only scrub to get a file listing of bad files, then remount
>> read-write degraded and delete them all. Now you maybe can do a device
>> delete missing. But it's still a tight fit, it basically has to
>> balance things out to get it to fit on an odd number of drives, it may
>> actually not work even though there seems to be enough total space,
>> there has to be enough space on FOUR drives.
>>
> Are you sure you are talking about data in single mode ?
> I don't understand why you are talking about raid10,
> or the fact that it will have to rebalance everything.

Yeah sorry I got confused in that very last sentence. Single, it will
find space in 1GiB increments. Of course this fails because that data
doesn't exist anymore, but to start the operation it needs to be
possible.

>
> Moreover, even in degraded mode I cannot mount it in rw
> It tells me
> "too many missing devices, writeable remount is not allowed"
> due to the fact I'm in single mode.

Oh you're in that trap. Well now you're stuck. I've had the case where
I could mount read write degraded with metadata raid1 and data single,
but it was good for only one mount and then I get the same message you
get and it was only possible to mount read only. At that point it's
totally suck unless you're adept at manipulating the file system with
a hex editor...

Someone might have a patch somewhere that drops this check and lets
too many missing devices to mount anyway... I seem to recall this.
It'd be in the archives if it exists.

> And as far as as know, btrfs replace and btrfs delete, are not supposed
> to work in read only...

It doesn't. Must be read write mounted.

>
> I would like to tell him forgot about the missing data, and give me back
> my partition.

This feature doesn't exist yet. I really want to see this, it'd be
great for ceph and gluster if the volume could lose a drive, report
all the missing files to the cluster file system, delete the device
and the file references, and then the cluster knows that brick doesn't
have those files and can replicate them somewhere else or even back to
the brick that had them.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: multi-device btrfs with single data mode and disk failure

2016-09-15 Thread Alexandre Poux

Thank you very much for your answers

Le 15/09/2016 à 17:38, Chris Murphy a écrit :
> On Thu, Sep 15, 2016 at 1:44 AM, Alexandre Poux  wrote:
>> Is it possible to do some king of a "btrfs delete missing" on this
>> kind of setup, in order to recover access in rw to my other data, or
>> I must copy all my data on a new partition
> That *should* work :) Except that your file system with 6 drives is
> too full to be shrunk to 5 drives. Btrfs will either refuse, or get
> confused, about how to shrink a nearly full 6 drive volume into 5.
>
> So you'll have to do one of three things:
>
> 1. Add a 2+TB drive, then remove the missing one; OR
> 2. btrfs replace is faster and is raid10 reliable; OR
> 3. Read only scrub to get a file listing of bad files, then remount
> read-write degraded and delete them all. Now you maybe can do a device
> delete missing. But it's still a tight fit, it basically has to
> balance things out to get it to fit on an odd number of drives, it may
> actually not work even though there seems to be enough total space,
> there has to be enough space on FOUR drives.
>
Are you sure you are talking about data in single mode ?
I don't understand why you are talking about raid10,
or the fact that it will have to rebalance everything.

Moreover, even in degraded mode I cannot mount it in rw
It tells me
"too many missing devices, writeable remount is not allowed"
due to the fact I'm in single mode.

And as far as as know, btrfs replace and btrfs delete, are not supposed
to work in read only...

I would like to tell him forgot about the missing data, and give me back
my partition.

In fact I'm pretty sure, there was no data at all on the dead device,
only metadata in raid1.
I'm currently scrubing to be absolutely sure

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: unable to handle kernel paging request

2016-09-15 Thread Chris Mason


On 09/15/2016 10:08 AM, Mark Gavalda wrote:

Hi,

Bumped into the following one today; kernel 4.4.0-36-generic Ubuntu
16.4.1; CPU went to 100% and only a hard restart solved the issue.
Since then everything's back to normal.

Please let me know how can I help get to the bottom of this?


I saw similar traces when tracking down this bug:

https://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?h=for-linus-4.8=cbd60aa7cd17d81a434234268c55192862147439

It's flagged for stable, so you'll get it with the next stable update, 
or you can apply it by hand and rebuild.


-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Size of scrubbed Data

2016-09-15 Thread Stefan Malte Schumacher

Hello


I have encountered a very strange phenomenon while using btrfs-scrub.
I believe it may be a result of replacing my old installation of
Debian Jessie with Debian Stretch, resulting in a Kernel Switch from
3.16+63 to 4.6.0-1. I scrub my filesystem once a month and let anacron
send me the results. My filesystem, consisting of four four-gigabyte
drives with both data and metadata as RAID1 was reported as containing
nearly 12TiB of data in scrubs done in May, June, July and August. But
then it changed and suddenly shows only 9TiB in size, despite the fact
that I did not delete any large files. If I remember correctly my
switch from Debian Jessie to Stretch was around that time period.
Could someone explain this behavior to me? Was a new way of
calculating the size of scrubed data introduced? How can I check if I
have lost data? I have a backup, but only one generation and rsync
will by now have deleted files on the NAS, which might have lost on
the fileserver. According to the long and short self-tests, which I
run with smartmontools my drives are alright. How do I proceed?



Yours

Stefan

uname -a
Linux mars 4.6.0-1-amd64 #1 SMP Debian 4.6.4-1 (2016-07-18) x86_64 GNU/Linux

btrfs --version
btrfs-progs v4.7.1

 btrfs fi show
Label: none  uuid: 8c668854-db5d-45a7-875d-43c4e82a829e
Total devices 4 FS bytes used 6.06TiB
devid1 size 3.64TiB used 3.09TiB path /dev/sde
devid2 size 3.64TiB used 3.09TiB path /dev/sdc
devid3 size 3.64TiB used 3.09TiB path /dev/sdd
devid4 size 3.64TiB used 3.09TiB path /dev/sda


 btrfs fi df /mnt/btrfs-raid/
Data, RAID1: total=6.17TiB, used=6.05TiB
System, RAID1: total=32.00MiB, used=916.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, RAID1: total=10.00GiB, used=8.14GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

Maybe this is also of use in identifying the problem:
grep btrfs *
grep: apt: Ist ein Verzeichnis
grep: cups: Ist ein Verzeichnis
dpkg.log:2016-09-03 15:20:16 upgrade btrfs-progs:amd64 4.7-1 4.7.1-1
dpkg.log:2016-09-03 15:20:16 status triggers-awaited btrfs-progs:amd64 4.7-1
dpkg.log:2016-09-03 15:20:16 status half-configured btrfs-progs:amd64 4.7-1
dpkg.log:2016-09-03 15:20:16 status unpacked btrfs-progs:amd64 4.7-1
dpkg.log:2016-09-03 15:20:16 status half-installed btrfs-progs:amd64 4.7-1
dpkg.log:2016-09-03 15:20:16 status half-installed btrfs-progs:amd64 4.7-1
dpkg.log:2016-09-03 15:20:17 status unpacked btrfs-progs:amd64 4.7.1-1
dpkg.log:2016-09-03 15:20:17 status unpacked btrfs-progs:amd64 4.7.1-1
dpkg.log:2016-09-03 15:20:45 configure btrfs-progs:amd64 4.7.1-1 
dpkg.log:2016-09-03 15:20:45 status unpacked btrfs-progs:amd64 4.7.1-1
dpkg.log:2016-09-03 15:20:45 status unpacked btrfs-progs:amd64 4.7.1-1
dpkg.log:2016-09-03 15:20:45 status half-configured btrfs-progs:amd64 4.7.1-1
dpkg.log:2016-09-03 15:20:46 status triggers-awaited btrfs-progs:amd64 4.7.1-1
dpkg.log:2016-09-03 15:20:51 status installed btrfs-progs:amd64 4.7.1-1
dpkg.log.1:2016-08-10 16:58:23 upgrade btrfs-progs:amd64 4.5.2-1 4.6.1-1
dpkg.log.1:2016-08-10 16:58:23 status triggers-awaited btrfs-progs:amd64 4.5.2-1
dpkg.log.1:2016-08-10 16:58:23 status half-configured btrfs-progs:amd64 4.5.2-1
dpkg.log.1:2016-08-10 16:58:23 status unpacked btrfs-progs:amd64 4.5.2-1
dpkg.log.1:2016-08-10 16:58:23 status half-installed btrfs-progs:amd64 4.5.2-1
dpkg.log.1:2016-08-10 16:58:24 status half-installed btrfs-progs:amd64 4.5.2-1
dpkg.log.1:2016-08-10 16:58:24 status unpacked btrfs-progs:amd64 4.6.1-1
dpkg.log.1:2016-08-10 16:58:24 status unpacked btrfs-progs:amd64 4.6.1-1
dpkg.log.1:2016-08-10 17:01:25 configure btrfs-progs:amd64 4.6.1-1 
dpkg.log.1:2016-08-10 17:01:25 status unpacked btrfs-progs:amd64 4.6.1-1
dpkg.log.1:2016-08-10 17:01:26 status unpacked btrfs-progs:amd64 4.6.1-1
dpkg.log.1:2016-08-10 17:01:26 status half-configured btrfs-progs:amd64 4.6.1-1
dpkg.log.1:2016-08-10 17:01:26 status triggers-awaited btrfs-progs:amd64 4.6.1-1
dpkg.log.1:2016-08-10 17:02:34 status installed btrfs-progs:amd64 4.6.1-1
dpkg.log.1:2016-08-19 00:45:05 upgrade btrfs-progs:amd64 4.6.1-1 4.7-1
dpkg.log.1:2016-08-19 00:45:05 status triggers-awaited btrfs-progs:amd64 4.6.1-1
dpkg.log.1:2016-08-19 00:45:05 status half-configured btrfs-progs:amd64 4.6.1-1
dpkg.log.1:2016-08-19 00:45:05 status unpacked btrfs-progs:amd64 4.6.1-1
dpkg.log.1:2016-08-19 00:45:05 status half-installed btrfs-progs:amd64 4.6.1-1
dpkg.log.1:2016-08-19 00:45:06 status half-installed btrfs-progs:amd64 4.6.1-1
dpkg.log.1:2016-08-19 00:45:06 status unpacked btrfs-progs:amd64 4.7-1
dpkg.log.1:2016-08-19 00:45:06 status unpacked btrfs-progs:amd64 4.7-1
dpkg.log.1:2016-08-19 00:47:06 configure btrfs-progs:amd64 4.7-1 
dpkg.log.1:2016-08-19 00:47:06 status unpacked btrfs-progs:amd64 4.7-1
dpkg.log.1:2016-08-19 00:47:06 status unpacked btrfs-progs:amd64 4.7-1
dpkg.log.1:2016-08-19 00:47:06 status half-configured btrfs-progs:amd64 4.7-1
dpkg.log.1:2016-08-19 00:47:06 status

Thoughts on btrfs RAID-1 for cold storage/archive?

2016-09-15 Thread E V

I'm investigating using btrfs for archiving old data and offsite
storage, essentially put 2 drives in btrfs RAID-1, copy the data to
the filesystem and then unmount, remove a drive and take it to an
offsite location. Remount the other drive -o ro,degraded until my
systems slots fill up, then remove the local drive and put it on a
shelf. I'd verify the file md5sums after data is written to the drive
for piece of mind, but maybe a btrfs scrub would give the same
assurances. Seem straightforward? Anything to look out for? Long term
format stability seems good, right? Also, I like the idea of being
able to pull the offsite drive back and scrub if the local drive ever
has problems, a nice extra piece of mind we wouldn't get with ext4.
Currently using the 4.1.32 kernel since the driver for the r750 card
in our 45 drives system only supports up to 4.3 ATM.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: multi-device btrfs with single data mode and disk failure

2016-09-15 Thread Chris Murphy

On Thu, Sep 15, 2016 at 1:44 AM, Alexandre Poux  wrote:
> I had a btrfs partition on a 6 disk array without raid (metadata in
> raid10, but data in single), and one of the disks just died.
>
> So I lost some of my data, ok, I knew that.
>
> But two question :
>
>   *
>
> Is it possible to know (using metadata I suppose) what data I have
> lost ?

The safest option is to remount read only and do a read only scrub.
That will spit out messages for corrupt (missing) metadata and data,
to the kernel message buffer.  The  missing data will appear as
corrupt files that can't be fixed with full file paths. There will
likely be so many that dmesg will be useless so you'll need to use
journalctl -fk to follow the scrub; or journalctl -bk after the fact
or even -b-1 -k or -b-2 -k, etc.; or /var/log/messages as it's
probably going to exceed the kernel message buffer, so dmesg won't
help.

> Is it possible to do some king of a "btrfs delete missing" on this
> kind of setup, in order to recover access in rw to my other data, or
> I must copy all my data on a new partition

That *should* work :) Except that your file system with 6 drives is
too full to be shrunk to 5 drives. Btrfs will either refuse, or get
confused, about how to shrink a nearly full 6 drive volume into 5.

So you'll have to do one of three things:

1. Add a 2+TB drive, then remove the missing one; OR
2. btrfs replace is faster and is raid10 reliable; OR
3. Read only scrub to get a file listing of bad files, then remount
read-write degraded and delete them all. Now you maybe can do a device
delete missing. But it's still a tight fit, it basically has to
balance things out to get it to fit on an odd number of drives, it may
actually not work even though there seems to be enough total space,
there has to be enough space on FOUR drives.

I'd go with option 2.  And that should still spit out the paths to bad
files. If the replace works, I'm pretty sure you still need to delete
all of the files that are missing in order to get rid of the
corruption warnings on any subsequent scrub or balance.

>
> btrfs --version :
> btrfs-progs v4.7.1

You should upgrade to 4.7.2 or downgrade to 4.6.1 before doing btrfs
check. Not urgent so long as you don't actually do a repair with this
version.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: stability matrix

2016-09-15 Thread Martin Steigerwald

Am Donnerstag, 15. September 2016, 07:54:26 CEST schrieb Austin S. Hemmelgarn:
> On 2016-09-15 05:49, Hans van Kranenburg wrote:
> > On 09/15/2016 04:14 AM, Christoph Anton Mitterer wrote:
[…]
> I specifically do not think we should worry about distro kernels though.
>   If someone is using a specific distro, that distro's documentation
> should cover what they support and what works and what doesn't.  Some
> (like Arch and to a lesser extent Gentoo) use almost upstream kernels,
> so there's very little point in tracking them.  Some (like Ubuntu and
> Debian) use almost upstream LTS kernels, so there's little point
> tracking them either.  Many others though (like CentOS, RHEL, and OEL)
> Use forked kernels that have so many back-ported patches that it's
> impossible to track up-date to up-date what the hell they've got.  A
> rather ridiculous expression regarding herding of cats comes to mind
> with respect to the last group.

Yep. I just read through RHEL releasenotes for a RHEL 7 workshop I will hold 
for a customer… and noted that newer RHEL 7 kernels for example have device 
mapper from Kernel 4.1 (while the kernel still says its a 3.10 one), XFS from 
kernel this.that, including new incompat CRC disk format and the need to also 
upgrade xfsprogs in lockstep, and this and that from kernel this.that and so 
on. Frankenstein as an association comes to my mind, but I bet RHEL kernel 
engineers know what they are doing.

-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Preliminary BTRFS Encryption

2016-09-15 Thread Austin S. Hemmelgarn


On 2016-09-15 10:06, Anand Jain wrote:


Thanks for comments.
Pls see inline as below.

On 09/15/2016 07:37 PM, Austin S. Hemmelgarn wrote:

On 2016-09-13 09:39, Anand Jain wrote:


This patchset adds btrfs encryption support.

The main objective of this series is to have bugs fixed and stability.
I have verified with fstests to confirm that there is no regression.

A design write-up is coming next, however here below is the quick
example
on the cli usage. Please try out, let me know if I have missed
something.

Also would like to mention that a review from the security experts is
due,
which is important and I believe those review comments can be
accommodated
without major changes from here.

Also yes, thanks for the emails, I hear, per file encryption and inline
with vfs layer is also important, which is wip among other things in the
list.

As of now these patch set supports encryption on per subvolume, as
managing properties on per subvolume is a kind of core to btrfs,
which is
easier for data center solution-ing, seamlessly persistent and easy to
manage.


Steps:
-

Make sure following kernel TFMs are compiled in.
# cat /proc/crypto | egrep 'cbc\(aes\)|ctr\(aes\)'
name : ctr(aes)
name : cbc(aes)

Create encrypted subvolume.
# btrfs su create -e 'ctr(aes)' /btrfs/e1
Create subvolume '/btrfs/e1'
Passphrase:
Again passphrase:

A key is created and its hash is updated into the subvolume item,
and then added to the system keyctl.
# btrfs su show /btrfs/e1 | egrep -i encrypt
Encryption: ctr(aes)@btrfs:75197c8e (594790215)

# keyctl show 594790215
Keyring
 594790215 --alsw-v  0 0  logon: btrfs:75197c8e


Now any file data extents under the subvol /btrfs/e1 will be
encrypted.

You may revoke key using keyctl or btrfs(8) as below.
# btrfs su encrypt -k out /btrfs/e1

# btrfs su show /btrfs/e1 | egrep -i encrypt
Encryption: ctr(aes)@btrfs:75197c8e (Required key not
available)

# keyctl show 594790215
Keyring
Unable to dump key: Key has been revoked

As the key hash is updated, If you provide wrong passphrase in the next
key in, it won't add key to the system. So we have key verification
from the day1.

# btrfs su encrypt -k in /btrfs/e1
Passphrase:
Again passphrase:
ERROR: failed to set attribute 'btrfs.encrypt' to
'ctr(aes)@btrfs:75197c8e' : Key was rejected by service

ERROR: key set failed: Key was rejected by service

# btrfs su encrypt -k in /btrfs/e1
Passphrase:
Again passphrase:
key for '/btrfs/e1' has  logged in with keytag 'btrfs:75197c8e'

Now if you revoke the key the read / write fails with key error.

# md5sum /btrfs/e1/2k-test-file
8c9fbc69125ebe84569a5c1ca088cb14  /btrfs/e1/2k-test-file

# btrfs su encrypt -k out /btrfs/e1

# md5sum /btrfs/e1/2k-test-file
md5sum: /btrfs/e1/2k-test-file: Key has been revoked

# cp /tfs/1k-test-file /btrfs/e1/
cp: cannot create regular file ‘/btrfs/e1/1k-test-file’: Key has been
revoked

Plain text memory scratches for security reason is pending. As there
are some
key revoke notification challenges to coincide with encryption context
switch,
which I do believe should be fixed in the due course, but is not a
roadblock
at this stage.






Before I make any other comments, I should state that I asbolutely agree
with Alex Elsayed about the issues with using CBC or CTR mode, and not
supporting AE or AEAD modes.


  Alex comments was quite detailed, I did reply to it.
  Looks like you missed my reply to Alex's comments ?
I've been having issues with GMail delaying random e-mails for excessive 
amounts of time (hours sometimes), so I didn't see your reply before 
sending this.  Even so, I do want it on the record that I agree with him 
completely.



How does this handle cloning of extents?  Can extents be cloned across
subvolume boundaries when one of the subvolumes is encrypted?


 Yes only if both the subvol keys match.

OK, that makes sense.



Can they
be cloned within an encrypted subvolume?


 Yes. That's things as usual.
Glad to see that that still works.  Most people I know who do batch 
deduplication do so within subvolumes but not across them, so that still 
working with encrypted subvolumes is a good thing.



What happens when you try to
clone them in either case if it isn't supported?


 Gets -EOPNOTSUPP.
That actually makes more sense than what my first thought for a return 
code was (-EINVAL).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: stability matrix

2016-09-15 Thread Chris Murphy

On Thu, Sep 15, 2016 at 5:54 AM, Austin S. Hemmelgarn
 wrote:
>
>
> I specifically do not think we should worry about distro kernels though.

It will be essentially impossible to keep such a thing up to date.
It's difficult in the best case scenario to even track upstream's own
backports to longterm kernels, whether those would actually even
change anything in the matrix.

I'd say each major version gets it's own page, and just dup the page
for each version.

So for starters, the current page is for version 4.7. If when 4.8 is
released there's no significant change in stability that affects the
color (stability status) of any listed feature, then that page could
say 4.7 through current. If it's true that the status page has no
major changes going back to 4.4 through current, label it that way.

As soon as there's a change that affects the color coding of an item
in the grid, duplicate the page. Old page gets a fixed range of
kernels, say 4.4 to 4.7. And now the newest page is 4.8 - current.

I think a column for version will lose the historical perspective of
when something goes from red to yellow, yellow to green.

> If
> someone is using a specific distro, that distro's documentation should cover
> what they support and what works and what doesn't.  Some (like Arch and to a
> lesser extent Gentoo) use almost upstream kernels, so there's very little
> point in tracking them.  Some (like Ubuntu and Debian) use almost upstream
> LTS kernels, so there's little point tracking them either.  Many others
> though (like CentOS, RHEL, and OEL) Use forked kernels that have so many
> back-ported patches that it's impossible to track up-date to up-date what
> the hell they've got.  A rather ridiculous expression regarding herding of
> cats comes to mind with respect to the last group.

Yeah you need the secret decoder ring to sort it out. Forget it, not worth it.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

unable to handle kernel paging request

2016-09-15 Thread Mark Gavalda

Hi,

Bumped into the following one today; kernel 4.4.0-36-generic Ubuntu
16.4.1; CPU went to 100% and only a hard restart solved the issue.
Since then everything's back to normal.

Please let me know how can I help get to the bottom of this?

[239049.350514] BUG: unable to handle kernel paging request at d3c53de8
[239049.358107] IP: [] hrtimer_active+0x9/0x60
[239049.364127] PGD a688df067 PUD 0
[239049.367828] Oops:  [#2] SMP
[239049.371543] Modules linked in: xt_recent xt_nat xt_multiport
ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt nf_conntrack_ipv6
nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 xt_limit xt_addrtype
xt_conntrack binfmt_misc veth xt_CHECKSUM iptable_mangle xt_tcpudp
ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_comment iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables
x_tables ppdev parport_pc parport serio_raw pvpanic ib_iser rdma_cm
iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp
libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456
async_raid6_recov async_memcpy async_pq async_xor async_tx xor
raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul
crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper
ablk_helper cryptd psmouse virtio_scsi
[239049.457553] CPU: 24 PID: 1298718 Comm: kworker/u64:24 Tainted: G
   D 4.4.0-36-generic #55-Ubuntu
[239049.467497] Hardware name: Google Google/Google, BIOS Google 01/01/2011
[239049.474348] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[239049.481472] task: 8801512dee00 ti: 8801fd1c8000 task.ti:
8801fd1c8000
[239049.489205] RIP: 0010:[]  []
hrtimer_active+0x9/0x60
[239049.497702] RSP: 0018:8801fd1cbc20  EFLAGS: 00010046
[239049.503230] RAX:  RBX: d3c53db8 RCX:

[239049.510569] RDX:  RSI: 0003 RDI:
d3c53db8
[239049.517910] RBP: 8801fd1cbc20 R08: 88336f416d00 R09:
88333fe57e00
[239049.525250] R10: 00a0 R11: 01b5675f R12:
8807024cb628
[239049.532595] R13: 8802d5e5bd98 R14:  R15:
0003
[239049.539938] FS:  () GS:88336f40()
knlGS:
[239049.548246] CS:  0010 DS:  ES:  CR0: 80050033
[239049.554234] CR2: d3c53de8 CR3: 000a5e8a1000 CR4:
001406e0
[239049.561576] Stack:
[239049.563812]  8801fd1cbc58 810ef249 0003
d3b71a0d
[239049.571936]  d3c53db8 8807024cb628 8802d5e5bd98
8801fd1cbca8
[239049.580141]  8182d3c1 810c35f2 00010001

[239049.588300] Call Trace:
[239049.590976]  [] hrtimer_try_to_cancel+0x29/0x130
[239049.597384]  [] schedule_hrtimeout_range_clock+0xd1/0x1b0
[239049.604571]  [] ? __wake_up_common+0x52/0x90
[239049.610619]  [] __wake_up+0x39/0x50
[239049.615906]  []
btrfs_remove_ordered_extent+0x154/0x250 [btrfs]
[239049.623620]  []
btrfs_finish_ordered_io+0x1d0/0x650 [btrfs]
[239049.630993]  [] finish_ordered_fn+0x15/0x20 [btrfs]
[239049.637666]  []
btrfs_scrubparity_helper+0xca/0x2f0 [btrfs]
[239049.645042]  [] btrfs_endio_write_helper+0xe/0x10 [btrfs]
[239049.652239]  [] process_one_work+0x165/0x480
[239049.658293]  [] worker_thread+0x4b/0x4c0
[239049.663984]  [] ? process_one_work+0x480/0x480
[239049.670196]  [] ? process_one_work+0x480/0x480
[239049.676420]  [] kthread+0xd8/0xf0
[239049.681514]  [] ? kthread_create_on_node+0x1e0/0x1e0
[239049.688260]  [] ret_from_fork+0x3f/0x70
[239049.693870]  [] ? kthread_create_on_node+0x1e0/0x1e0
[239049.700599] Code: 00 00 0f 1f 44 00 00 55 48 c7 47 28 10 f0 0e 81
48 89 77 58 48 89 e5 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
55 48 89 e5 <48> 8b 57 30 eb 1d 80 7f 38 00 75 32 48 3b 78 08 74 2c 39
50 04
[239049.727797] RIP  [] hrtimer_active+0x9/0x60
[239049.733868]  RSP 
[239049.737557] CR2: d3c53de8
[239049.741535] ---[ end trace 774da4af66731bb5 ]---

Thanks,
Mark Gavalda
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: mkfs+mount failure of small fs on ppc64

2016-09-15 Thread Eric Sandeen

On 9/13/16 4:44 PM, Eric Sandeen wrote:
> on ppc64, 4.7-rc kernel, git btrfs-progs, v4.7.2:
> 
> # truncate --size=500m testfile
> # ./mkfs.btrfs testfile
> # mkdir -p mnt
> # mount -o loop testfile mnt

Same failure on aarch64 if that makes it any more interesting.  ;)

# mount -o loop testfile mnt
mount: mount /dev/loop0 on /root/mnt failed: No space left on device

Sector size issue I guess, driven by page size.

-Eric

> btrfs-progs v4.7.2
> See http://btrfs.wiki.kernel.org for more information.
> 
> Label:  (null)
> UUID:   c531b759-a491-4c9f-a954-4787cea9106d
> Node size:  65536
> Sector size:65536
> Filesystem size:500.00MiB
> Block group profiles:
>   Data: single8.00MiB
>   Metadata: DUP  32.00MiB
>   System:   DUP   8.00MiB
> SSD detected:   no
> Incompat features:  extref, skinny-metadata
> Number of devices:  1
> Devices:
>IDSIZE  PATH
> 1   500.00MiB  testfile
> 
> 
> # dmesg -c
> [   61.210287] loop: module loaded
> [   61.247105] BTRFS: device fsid a8d79cd0-977f-4b93-8410-246dc08b3683 devid 
> 1 transid 5 /dev/loop0
> [   61.247391] BTRFS info (device loop0): disk space caching is enabled
> [   61.247397] BTRFS info (device loop0): has skinny extents
> [   61.270492] BTRFS info (device loop0): creating UUID tree
> [   61.312149] BTRFS warning (device loop0): failed to create the UUID tree: 
> -28
> [   61.483028] BTRFS: open_ctree failed
> 
> 2nd mount works:
> 
> # mount -o loop testfile mnt
> # dmesg -c
> [   87.504564] BTRFS info (device loop0): disk space caching is enabled
> [   87.504579] BTRFS info (device loop0): has skinny extents
> [   87.506979] BTRFS info (device loop0): creating UUID tree
> 
> Any ideas?  This seems to have regressed since 3.9.1, but there are a couple
> other mkfs breakages in between, and my bisect was not fruitful.
> 
> Thanks,
> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Preliminary BTRFS Encryption

2016-09-15 Thread Anand Jain



Thanks for comments.
Pls see inline as below.

On 09/15/2016 07:37 PM, Austin S. Hemmelgarn wrote:

On 2016-09-13 09:39, Anand Jain wrote:


This patchset adds btrfs encryption support.

The main objective of this series is to have bugs fixed and stability.
I have verified with fstests to confirm that there is no regression.

A design write-up is coming next, however here below is the quick example
on the cli usage. Please try out, let me know if I have missed something.

Also would like to mention that a review from the security experts is
due,
which is important and I believe those review comments can be
accommodated
without major changes from here.

Also yes, thanks for the emails, I hear, per file encryption and inline
with vfs layer is also important, which is wip among other things in the
list.

As of now these patch set supports encryption on per subvolume, as
managing properties on per subvolume is a kind of core to btrfs, which is
easier for data center solution-ing, seamlessly persistent and easy to
manage.


Steps:
-

Make sure following kernel TFMs are compiled in.
# cat /proc/crypto | egrep 'cbc\(aes\)|ctr\(aes\)'
name : ctr(aes)
name : cbc(aes)

Create encrypted subvolume.
# btrfs su create -e 'ctr(aes)' /btrfs/e1
Create subvolume '/btrfs/e1'
Passphrase:
Again passphrase:

A key is created and its hash is updated into the subvolume item,
and then added to the system keyctl.
# btrfs su show /btrfs/e1 | egrep -i encrypt
Encryption: ctr(aes)@btrfs:75197c8e (594790215)

# keyctl show 594790215
Keyring
 594790215 --alsw-v  0 0  logon: btrfs:75197c8e


Now any file data extents under the subvol /btrfs/e1 will be
encrypted.

You may revoke key using keyctl or btrfs(8) as below.
# btrfs su encrypt -k out /btrfs/e1

# btrfs su show /btrfs/e1 | egrep -i encrypt
Encryption: ctr(aes)@btrfs:75197c8e (Required key not
available)

# keyctl show 594790215
Keyring
Unable to dump key: Key has been revoked

As the key hash is updated, If you provide wrong passphrase in the next
key in, it won't add key to the system. So we have key verification
from the day1.

# btrfs su encrypt -k in /btrfs/e1
Passphrase:
Again passphrase:
ERROR: failed to set attribute 'btrfs.encrypt' to
'ctr(aes)@btrfs:75197c8e' : Key was rejected by service

ERROR: key set failed: Key was rejected by service

# btrfs su encrypt -k in /btrfs/e1
Passphrase:
Again passphrase:
key for '/btrfs/e1' has  logged in with keytag 'btrfs:75197c8e'

Now if you revoke the key the read / write fails with key error.

# md5sum /btrfs/e1/2k-test-file
8c9fbc69125ebe84569a5c1ca088cb14  /btrfs/e1/2k-test-file

# btrfs su encrypt -k out /btrfs/e1

# md5sum /btrfs/e1/2k-test-file
md5sum: /btrfs/e1/2k-test-file: Key has been revoked

# cp /tfs/1k-test-file /btrfs/e1/
cp: cannot create regular file ‘/btrfs/e1/1k-test-file’: Key has been
revoked

Plain text memory scratches for security reason is pending. As there
are some
key revoke notification challenges to coincide with encryption context
switch,
which I do believe should be fixed in the due course, but is not a
roadblock
at this stage.






Before I make any other comments, I should state that I asbolutely agree
with Alex Elsayed about the issues with using CBC or CTR mode, and not
supporting AE or AEAD modes.


  Alex comments was quite detailed, I did reply to it.
  Looks like you missed my reply to Alex's comments ?


How does this handle cloning of extents?  Can extents be cloned across
subvolume boundaries when one of the subvolumes is encrypted?


 Yes only if both the subvol keys match.


Can they
be cloned within an encrypted subvolume?


 Yes. That's things as usual.


What happens when you try to
clone them in either case if it isn't supported?


 Gets -EOPNOTSUPP.

Thanks, Anand

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH]btrfs-progs: btrfs-convert.c : check source file system state

2016-09-15 Thread Lakshmipathi.G

Signed-off-by: Lakshmipathi.G 
---
 btrfs-convert.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/btrfs-convert.c b/btrfs-convert.c
index c10dc17..27da9ce 100644
--- a/btrfs-convert.c
+++ b/btrfs-convert.c
@@ -2171,6 +2171,17 @@ static void ext2_copy_inode_item(struct btrfs_inode_item 
*dst,
}
memset(>reserved, 0, sizeof(dst->reserved));
 }
+static int check_filesystem_state(struct btrfs_convert_context *cctx)
+{
+   ext2_filsys fs = cctx->fs_data;
+
+if (!(fs->super->s_state & EXT2_VALID_FS))
+   return 1;
+   else if (fs->super->s_state & EXT2_ERROR_FS)
+   return 1;
+   else
+   return 0;
+}
 
 /*
  * copy a single inode. do all the required works, such as cloning
@@ -2340,6 +2351,10 @@ static int do_convert(const char *devname, int datacsum, 
int packing,
ret = convert_open_fs(devname, );
if (ret)
goto fail;
+   ret = check_filesystem_state();
+   if (ret) 
+   warning("Source Filesystem is not clean, \
+running e2fsck is recommended.");
ret = convert_read_used_space();
if (ret)
goto fail;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: stability matrix

2016-09-15 Thread Austin S. Hemmelgarn


On 2016-09-15 05:49, Hans van Kranenburg wrote:

On 09/15/2016 04:14 AM, Christoph Anton Mitterer wrote:

Hey.

As for the stability matrix...

In general:
- I think another column should be added, which tells when and for
  which kernel version the feature-status of each row was
  revised/updated the last time and especially by whom.
  If a core dev makes a statement on a particular feature, this
  probably means much more, than if it was made by "just" a list
  regular.
  And yes I know, in the beginning it already says "this is for 4.7"...
  but let's be honest, it's pretty likely when this is bumped to 4.8
  that not each and every point will be thoroughly checked again.
- Optionally even one further column could be added, that lists bugs
  where the specific cases are kept record of (if any).
- Perhaps a 3rd Status like "eats-your-data" which is worse than
  critical, e.g. for things were it's known that there is a high
  chance for still getting data corruption (RAID56?)


About the "for 4.7" issue... The Status page could have an extra column,
which for every OK labeled row lists the first version (kernel.org x.y.0
release) it's OK for.

The bugs make it more complicated.

* Feature A is labeled OK in kernel 5.0
* During development of kernel 8-rc, an eat my data bug is fixed. The OK
for this feature in the table is bumped to 8.0?
* kernel 5 is EOL
* kernel 6 is still supported, and the fix is applied to 6.12
* then there's distros which have their own old kernels, applying fixes
on them whenever they like, for example 5.6-distro4 which is leading its
own life

"Normal" users are using distro kernels. They shouldn't be panicing
about their data if they're running 6.14 or 5.6-distro4, but the OK in
the table is bumped to 8.0 because of the serious bugs.

At least the official kernels should be tracked in the table I think.

Separately, a list of known serious bugs per feature (like the 4 about
compression, http://www.spinics.net/lists/linux-btrfs/msg58674.html )
could be listed on another Bugs! page (lots of work) so a user, or
someone helping the user can see if the listed commits are or aren't
included in the actual whatever kernel a user is using.

This list of serious bugs could also help disussions that now sound like
"yeah, there were issues with compression which some time ago got fixed,
but noone knows what it was and when, so don't use compression".

Many of the commits which fix serious bugs (even if they're only
triggered in an edge case) have some explanation about how to trigger
them, like the excellent commit messages of Filipe in the commits
mentioned above. This helps setting up and maintaining the bug page, and
helps advanced users to decide if they're hitting the edge case or not
with their usage pattern.

I'd like to help creating/maintaining this bug overview. A good start
would be to just crawl through all stable kernels and some distro
kernels and see which commits show up in fs/btrfs.


As of right now, we kind of do have such a page:
https://btrfs.wiki.kernel.org/index.php/Gotchas
It's not really well labeled though, ans it's easy to overlook.

I specifically do not think we should worry about distro kernels though. 
 If someone is using a specific distro, that distro's documentation 
should cover what they support and what works and what doesn't.  Some 
(like Arch and to a lesser extent Gentoo) use almost upstream kernels, 
so there's very little point in tracking them.  Some (like Ubuntu and 
Debian) use almost upstream LTS kernels, so there's little point 
tracking them either.  Many others though (like CentOS, RHEL, and OEL) 
Use forked kernels that have so many back-ported patches that it's 
impossible to track up-date to up-date what the hell they've got.  A 
rather ridiculous expression regarding herding of cats comes to mind 
with respect to the last group.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Preliminary BTRFS Encryption

2016-09-15 Thread Alex Elsayed

On Thu, 15 Sep 2016 19:33:48 +0800, Anand Jain wrote:

> Thanks for commenting. pls see inline below.
> 
> On 09/15/2016 12:53 PM, Alex Elsayed wrote:
>> On Tue, 13 Sep 2016 21:39:46 +0800, Anand Jain wrote:
>>
>>> This patchset adds btrfs encryption support.
>>>
>>> The main objective of this series is to have bugs fixed and stability.
>>> I have verified with fstests to confirm that there is no regression.
>>>
>>> A design write-up is coming next, however here below is the quick
>>> example on the cli usage. Please try out, let me know if I have missed
>>> something.
>>>
>>> Also would like to mention that a review from the security experts is
>>> due,
>>> which is important and I believe those review comments can be
>>> accommodated without major changes from here.
>>>
>>> Also yes, thanks for the emails, I hear, per file encryption and
>>> inline with vfs layer is also important, which is wip among other
>>> things in the list.
>>>
>>> As of now these patch set supports encryption on per subvolume, as
>>> managing properties on per subvolume is a kind of core to btrfs, which
>>> is easier for data center solution-ing, seamlessly persistent and easy
>>> to manage.
>>>
>>>
>>> Steps:
>>> -
>>>
>>> Make sure following kernel TFMs are compiled in.
>>> # cat /proc/crypto | egrep 'cbc\(aes\)|ctr\(aes\)'
>>> name : ctr(aes)
>>> name : cbc(aes)
>>
>> First problem: These are purely encryption algorithms, rather than AE
>> (Authenticated Encryption) or AEAD (Authenticated Encryption with
>> Associated Data). As a result, they are necessarily vulnerable to
>> adaptive chosen-ciphertext attacks, and CBC has historically had other
>> issues. I highly recommend using a well-reviewed AE or AEAD mode, such
>> as AES-GCM (as ecryptfs does), as long as the code can handle the
>> ciphertext being longer than the plaintext.
>>
>> If it _cannot_ handle the ciphertext being longer than the plaintext,
>> please consider that a very serious red flag: It means that you cannot
>> provide better security than block-level encryption, which greatly
>> reduces the benefit of filesystem-integrated encryption. Being at the
>> extent level _should_ permit using AEAD - if it does not, something is
>> wrong.
>>
>> If at all possible, I'd suggest _only_ permitting AEAD cipher modes to
>> be used.
>>
>> Anyway, even for block-level encryption, CTR and CBC have been
>> considered obsolete and potentially dangerous to use in disk encryption
>> for quite a while - current recommendations for block-level encryption
>> are to use either a narrow-block tweakable cipher mode (such as XTS),
>> or a wide- block one (such as EME or CMC), with the latter providing
>> slightly better security, but worse performance.
> 
>Yes. CTR should be changed, so I have kept it as a cli option. And
>with the current internal design, hope we can plugin more algorithms
>as suggested/if-its-outdated and yes code can handle (or with a
>little tweak) bigger ciphertext (than plaintext) as well.
> 
>encryption + keyhash (as below) + Btrfs-data-checksum provides
>similar to AE,  right ?

No, it does not provide anything remotely similar to AE. AE requires 
_cryptographic_ authentication of the data. Not only is a CRC (as Btrfs 
uses for the data checksum) not enough, a _cryptographic hash_ (such as 
SHA256) isn't even enough. A MAC (message authentication code) is 
necessary.

Moreover, combining an encryption algorithm and a MAC is very easy to get 
wrong, in ways that absolutely ruin security - as an example, see the 
Vaudenay/Lucky13 padding oracle attacks on TLS.

In order for this to be secure, you need to use a secure encryption 
system that also authenticates the data in a cryptographically secure 
manner. Certain schemes are well-studied and believed to be secure - AES-
GCM and ChaCha20-Poly1305 are common and well-regarded, and there's a 
generic security reduction for Encrypt-then-MAC constructions (using CTR 
together with HMAC in such a construction is generally acceptable).

The Btrfs data checksum is wholly inadequate, and the keyhash is a non-
sequitur - it prevents accidentally opening the subvolume with the wrong 
key, but neither it (nor the btrfs data checksum, which is a CRC rather 
than a cryptographic MAC) protect adequately against malicious corruption 
of the ciphertext.

I'd suggest pulling in Herbert Xu, as he'd likely be able to tell you 
what of the Crypto API is actually sane to use for this.

>>> Create encrypted subvolume.
>>> # btrfs su create -e 'ctr(aes)' /btrfs/e1 Create subvolume '/btrfs/e1'
>>> Passphrase:
>>> Again passphrase:
>>
>> I presume the command first creates a key, then creates a subvolume
>> referencing that key? If so, that seems sensible.
> 
>   Hmm I didn't get the why part, any help ? (this doesn't encrypt
>   metadata part).

Basically, if your tool merely sets up an entry in the kernel keyring, 
then calls the subvolume creation interface (passing in the key ID), then 
it

Re: [RFC] Preliminary BTRFS Encryption

2016-09-15 Thread Austin S. Hemmelgarn


On 2016-09-13 09:39, Anand Jain wrote:


This patchset adds btrfs encryption support.

The main objective of this series is to have bugs fixed and stability.
I have verified with fstests to confirm that there is no regression.

A design write-up is coming next, however here below is the quick example
on the cli usage. Please try out, let me know if I have missed something.

Also would like to mention that a review from the security experts is due,
which is important and I believe those review comments can be accommodated
without major changes from here.

Also yes, thanks for the emails, I hear, per file encryption and inline
with vfs layer is also important, which is wip among other things in the
list.

As of now these patch set supports encryption on per subvolume, as
managing properties on per subvolume is a kind of core to btrfs, which is
easier for data center solution-ing, seamlessly persistent and easy to
manage.


Steps:
-

Make sure following kernel TFMs are compiled in.
# cat /proc/crypto | egrep 'cbc\(aes\)|ctr\(aes\)'
name : ctr(aes)
name : cbc(aes)

Create encrypted subvolume.
# btrfs su create -e 'ctr(aes)' /btrfs/e1
Create subvolume '/btrfs/e1'
Passphrase:
Again passphrase:

A key is created and its hash is updated into the subvolume item,
and then added to the system keyctl.
# btrfs su show /btrfs/e1 | egrep -i encrypt
Encryption: ctr(aes)@btrfs:75197c8e (594790215)

# keyctl show 594790215
Keyring
 594790215 --alsw-v  0 0  logon: btrfs:75197c8e


Now any file data extents under the subvol /btrfs/e1 will be
encrypted.

You may revoke key using keyctl or btrfs(8) as below.
# btrfs su encrypt -k out /btrfs/e1

# btrfs su show /btrfs/e1 | egrep -i encrypt
Encryption: ctr(aes)@btrfs:75197c8e (Required key not 
available)

# keyctl show 594790215
Keyring
Unable to dump key: Key has been revoked

As the key hash is updated, If you provide wrong passphrase in the next
key in, it won't add key to the system. So we have key verification
from the day1.

# btrfs su encrypt -k in /btrfs/e1
Passphrase:
Again passphrase:
ERROR: failed to set attribute 'btrfs.encrypt' to 'ctr(aes)@btrfs:75197c8e' : 
Key was rejected by service

ERROR: key set failed: Key was rejected by service

# btrfs su encrypt -k in /btrfs/e1
Passphrase:
Again passphrase:
key for '/btrfs/e1' has  logged in with keytag 'btrfs:75197c8e'

Now if you revoke the key the read / write fails with key error.

# md5sum /btrfs/e1/2k-test-file
8c9fbc69125ebe84569a5c1ca088cb14  /btrfs/e1/2k-test-file

# btrfs su encrypt -k out /btrfs/e1

# md5sum /btrfs/e1/2k-test-file
md5sum: /btrfs/e1/2k-test-file: Key has been revoked

# cp /tfs/1k-test-file /btrfs/e1/
cp: cannot create regular file ‘/btrfs/e1/1k-test-file’: Key has been revoked

Plain text memory scratches for security reason is pending. As there are some
key revoke notification challenges to coincide with encryption context switch,
which I do believe should be fixed in the due course, but is not a roadblock
at this stage.

Before I make any other comments, I should state that I asbolutely agree 
with Alex Elsayed about the issues with using CBC or CTR mode, and not 
supporting AE or AEAD modes.  If that's going to be the case, then 
there's essentially no point in merging this as is, as it has worse 
security than other filesystem level encryption options in the kernel by 
a pretty significant margin.  This absolutely _needs_ to be done right 
the first time, otherwise the reputation of BTRFS will suffer further, 
and nobody sane is going to use subvolume encryption for years after 
it's 'fixed' to be properly secure.


Now, the other thing I wanted to comment about:
How does this handle cloning of extents?  Can extents be cloned across 
subvolume boundaries when one of the subvolumes is encrypted?  Can they 
be cloned within an encrypted subvolume?  What happens when you try to 
clone them in either case if it isn't supported?


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Preliminary BTRFS Encryption

2016-09-15 Thread Anand Jain



Thanks for commenting. pls see inline below.

On 09/15/2016 12:53 PM, Alex Elsayed wrote:

On Tue, 13 Sep 2016 21:39:46 +0800, Anand Jain wrote:


This patchset adds btrfs encryption support.

The main objective of this series is to have bugs fixed and stability.
I have verified with fstests to confirm that there is no regression.

A design write-up is coming next, however here below is the quick
example on the cli usage. Please try out, let me know if I have missed
something.

Also would like to mention that a review from the security experts is
due,
which is important and I believe those review comments can be
accommodated without major changes from here.

Also yes, thanks for the emails, I hear, per file encryption and inline
with vfs layer is also important, which is wip among other things in the
list.

As of now these patch set supports encryption on per subvolume, as
managing properties on per subvolume is a kind of core to btrfs, which
is easier for data center solution-ing, seamlessly persistent and easy
to manage.


Steps:
-

Make sure following kernel TFMs are compiled in.
# cat /proc/crypto | egrep 'cbc\(aes\)|ctr\(aes\)'
name : ctr(aes)
name : cbc(aes)


First problem: These are purely encryption algorithms, rather than AE
(Authenticated Encryption) or AEAD (Authenticated Encryption with
Associated Data). As a result, they are necessarily vulnerable to
adaptive chosen-ciphertext attacks, and CBC has historically had other
issues. I highly recommend using a well-reviewed AE or AEAD mode, such as
AES-GCM (as ecryptfs does), as long as the code can handle the ciphertext
being longer than the plaintext.

If it _cannot_ handle the ciphertext being longer than the plaintext,
please consider that a very serious red flag: It means that you cannot
provide better security than block-level encryption, which greatly
reduces the benefit of filesystem-integrated encryption. Being at the
extent level _should_ permit using AEAD - if it does not, something is
wrong.

If at all possible, I'd suggest _only_ permitting AEAD cipher modes to be
used.

Anyway, even for block-level encryption, CTR and CBC have been considered
obsolete and potentially dangerous to use in disk encryption for quite a
while - current recommendations for block-level encryption are to use
either a narrow-block tweakable cipher mode (such as XTS), or a wide-
block one (such as EME or CMC), with the latter providing slightly better
security, but worse performance.


  Yes. CTR should be changed, so I have kept it as a cli option. And
  with the current internal design, hope we can plugin more algorithms
  as suggested/if-its-outdated and yes code can handle (or with a little
  tweak) bigger ciphertext (than plaintext) as well.

  encryption + keyhash (as below) + Btrfs-data-checksum provides
  similar to AE,  right ?



Create encrypted subvolume.
# btrfs su create -e 'ctr(aes)' /btrfs/e1 Create subvolume '/btrfs/e1'
Passphrase:
Again passphrase:


I presume the command first creates a key, then creates a subvolume
referencing that key? If so, that seems sensible.


 Hmm I didn't get the why part, any help ? (this doesn't encrypt
 metadata part).


A key is created and its hash is updated into the subvolume item,
and then added to the system keyctl.
# btrfs su show /btrfs/e1 | egrep -i encrypt
Encryption: ctr(aes)@btrfs:75197c8e (594790215)

# keyctl show 594790215 Keyring
 594790215 --alsw-v  0 0  logon: btrfs:75197c8e


That's entirely reasonable, though you may want to support "trusted and
encrypted keys" (Documentation/security/keys-trusted-encrypted.txt)


  Yes. that's in the list.


Now any file data extents under the subvol /btrfs/e1 will be encrypted.

You may revoke key using keyctl or btrfs(8) as below.
# btrfs su encrypt -k out /btrfs/e1

# btrfs su show /btrfs/e1 | egrep -i encrypt
Encryption: ctr(aes)@btrfs:75197c8e (Required key not

available)


# keyctl show 594790215 Keyring Unable to dump key: Key has been revoked

As the key hash is updated, If you provide wrong passphrase in the next
key in, it won't add key to the system. So we have key verification from
the day1.


This is good.


  Thanks.

Thanks, Anand



# btrfs su encrypt -k in /btrfs/e1 Passphrase:
Again passphrase:
ERROR: failed to set attribute 'btrfs.encrypt' to
'ctr(aes)@btrfs:75197c8e' : Key was rejected by service

ERROR: key set failed: Key was rejected by service

# btrfs su encrypt -k in /btrfs/e1 Passphrase:
Again passphrase:
key for '/btrfs/e1' has  logged in with keytag 'btrfs:75197c8e'

Now if you revoke the key the read / write fails with key error.

# md5sum /btrfs/e1/2k-test-file 8c9fbc69125ebe84569a5c1ca088cb14
/btrfs/e1/2k-test-file

# btrfs su encrypt -k out /btrfs/e1

# md5sum /btrfs/e1/2k-test-file md5sum: /btrfs/e1/2k-test-file: Key has
been revoked

# cp /tfs/1k-test-file /btrfs/e1/
cp: cannot create regular file ‘/btrfs/e1/1k-test-file’: Key has been
revoked

Plain text

Re: [RFC] Preliminary BTRFS Encryption

2016-09-15 Thread Anand Jain



Thanks for the comments. Pls see inline below..

On 09/15/2016 01:38 PM, Chris Murphy wrote:

On Tue, Sep 13, 2016 at 7:39 AM, Anand Jain  wrote:


This patchset adds btrfs encryption support.

The main objective of this series is to have bugs fixed and stability.
I have verified with fstests to confirm that there is no regression.

A design write-up is coming next, however here below is the quick example
on the cli usage. Please try out, let me know if I have missed something.


What's the behavior with nested subvolumes having different keys?

subvolume A (encrypted with key A)
|
- subvolume B (encrypted with key B)

Without encryption I can discover either A or B whether top-level, A,
or B are mounted.

With encryption, must A be opened [1] for B to be discovered? Must A
be opened before B can be opened? Or is the subvolume metadata always
non-encrypted, and it's just file extents that are encrypted? Are
filenames in those subvolumes discoverable (e.g. btrfs-debug-tree,
btrfs-image) if the subvolume is not opened? And reflink handling
between subvolumes behaves how?


  nested encrypting subvolume isn't supported, its just that it wasn't
  in my mind or the use case analysis review which I did, didn't tell
  me that. However I did a bit of code changes, its not that tough to
  get that in the current setup though. Yes only extent encrypted.

Thanks, Anand



[1] open in the cryptsetup open/luksOpen sense



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: stability matrix

2016-09-15 Thread Hans van Kranenburg

On 09/15/2016 04:14 AM, Christoph Anton Mitterer wrote:
> Hey.
> 
> As for the stability matrix...
> 
> In general:
> - I think another column should be added, which tells when and for
>   which kernel version the feature-status of each row was 
>   revised/updated the last time and especially by whom.
>   If a core dev makes a statement on a particular feature, this
>   probably means much more, than if it was made by "just" a list
>   regular.
>   And yes I know, in the beginning it already says "this is for 4.7"...
>   but let's be honest, it's pretty likely when this is bumped to 4.8
>   that not each and every point will be thoroughly checked again.
> - Optionally even one further column could be added, that lists bugs
>   where the specific cases are kept record of (if any).
> - Perhaps a 3rd Status like "eats-your-data" which is worse than
>   critical, e.g. for things were it's known that there is a high
>   chance for still getting data corruption (RAID56?)

About the "for 4.7" issue... The Status page could have an extra column,
which for every OK labeled row lists the first version (kernel.org x.y.0
release) it's OK for.

The bugs make it more complicated.

* Feature A is labeled OK in kernel 5.0
* During development of kernel 8-rc, an eat my data bug is fixed. The OK
for this feature in the table is bumped to 8.0?
* kernel 5 is EOL
* kernel 6 is still supported, and the fix is applied to 6.12
* then there's distros which have their own old kernels, applying fixes
on them whenever they like, for example 5.6-distro4 which is leading its
own life

"Normal" users are using distro kernels. They shouldn't be panicing
about their data if they're running 6.14 or 5.6-distro4, but the OK in
the table is bumped to 8.0 because of the serious bugs.

At least the official kernels should be tracked in the table I think.

Separately, a list of known serious bugs per feature (like the 4 about
compression, http://www.spinics.net/lists/linux-btrfs/msg58674.html )
could be listed on another Bugs! page (lots of work) so a user, or
someone helping the user can see if the listed commits are or aren't
included in the actual whatever kernel a user is using.

This list of serious bugs could also help disussions that now sound like
"yeah, there were issues with compression which some time ago got fixed,
but noone knows what it was and when, so don't use compression".

Many of the commits which fix serious bugs (even if they're only
triggered in an edge case) have some explanation about how to trigger
them, like the excellent commit messages of Filipe in the commits
mentioned above. This helps setting up and maintaining the bug page, and
helps advanced users to decide if they're hitting the edge case or not
with their usage pattern.

I'd like to help creating/maintaining this bug overview. A good start
would be to just crawl through all stable kernels and some distro
kernels and see which commits show up in fs/btrfs.

-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH]btrfs-progs: Add fast,slow symlinks and fifo types to convert test

2016-09-15 Thread Lakshmipathi.G

Signed-off-by: Lakshmipathi.G 
---
 tests/common.convert | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/tests/common.convert b/tests/common.convert
index 67c99b1..2790be5 100644
--- a/tests/common.convert
+++ b/tests/common.convert
@@ -25,10 +25,10 @@ generate_dataset() {
done
;;
 
-   symlink)
+   fast_symlink)
for num in $(seq 1 $DATASET_SIZE); do
run_check $SUDO_HELPER touch 
$dirpath/$dataset_type.$num
-   run_check $SUDO_HELPER ln -s 
$dirpath/$dataset_type.$num $dirpath/slink.$num
+   run_check $SUDO_HELPER cd $dirpath && ln -s 
$dataset_type.$num $dirpath/slink.$num && cd /
done
;;
 
@@ -71,12 +71,24 @@ generate_dataset() {
run_check $SUDO_HELPER setfattr -n user.foo -v 
bar$num $dirpath/$dataset_type.$num
done
;;
+   fifo)
+   for num in $(seq 1 $DATASET_SIZE); do
+   run_check $SUDO_HELPER mkfifo 
$dirpath/$dataset_type.$num
+   done
+   ;;
+   slow_symlink)
+   for num in $(seq 1 $DATASET_SIZE); do
+   fname64=`date +%s | sha256sum | cut -f1 -d'-'`
+   run_check $SUDO_HELPER touch $dirpath/$fname64
+   run_check $SUDO_HELPER ln -s $dirpath/$fname64 
$dirpath/slow_slink.$num
+   done
+   ;;
esac
 }
 
 populate_fs() {
 
-for dataset_type in 'small' 'hardlink' 'symlink' 'brokenlink' 'perm' 
'sparse' 'acls'; do
+for dataset_type in 'small' 'hardlink' 'fast_symlink' 'brokenlink' 
'perm' 'sparse' 'acls' 'fifo' 'slow_symlink'; do
generate_dataset "$dataset_type"
done
 }
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is stability a joke?

2016-09-15 Thread Martin Steigerwald

Am Donnerstag, 15. September 2016, 07:55:36 CEST schrieb Kai Krakow:
> Am Mon, 12 Sep 2016 08:20:20 -0400
> 
> schrieb "Austin S. Hemmelgarn" :
> > On 2016-09-11 09:02, Hugo Mills wrote:
> > > On Sun, Sep 11, 2016 at 02:39:14PM +0200, Waxhead wrote:
> > >> Martin Steigerwald wrote:
> >  [...]
> >  [...]
> >  [...]
> >  [...]
> >  
> > >> That is exactly the same reason I don't edit the wiki myself. I
> > >> could of course get it started and hopefully someone will correct
> > >> what I write, but I feel that if I start this off I don't have deep
> > >> enough knowledge to do a proper start. Perhaps I will change my
> > >> mind about this.
> > >> 
> > >Given that nobody else has done it yet, what are the odds that
> > > 
> > > someone else will step up to do it now? I would say that you should
> > > at least try. Yes, you don't have as much knowledge as some others,
> > > but if you keep working at it, you'll gain that knowledge. Yes,
> > > you'll probably get it wrong to start with, but you probably won't
> > > get it *very* wrong. You'll probably get it horribly wrong at some
> > > point, but even the more knowledgable people you're deferring to
> > > didn't identify the problems with parity RAID until Zygo and Austin
> > > and Chris (and others) put in the work to pin down the exact
> > > issues.
> > 
> > FWIW, here's a list of what I personally consider stable (as in, I'm
> > willing to bet against reduced uptime to use this stuff on production
> > systems at work and personal systems at home):
> > 1. Single device mode, including DUP data profiles on single device
> > without mixed-bg.
> > 2. Multi-device raid0, raid1, and raid10 profiles with symmetrical
> > devices (all devices are the same size).
> > 3. Multi-device single profiles with asymmetrical devices.
> > 4. Small numbers (max double digit) of snapshots, taken at infrequent
> > intervals (no more than once an hour).  I use single snapshots
> > regularly to get stable images of the filesystem for backups, and I
> > keep hourly ones of my home directory for about 48 hours.
> > 5. Subvolumes used to isolate parts of a filesystem from snapshots.
> > I use this regularly to isolate areas of my filesystems from backups.
> > 6. Non-incremental send/receive (no clone source, no parent's, no
> > deduplication).  I use this regularly for cloning virtual machines.
> > 7. Checksumming and scrubs using any of the profiles I've listed
> > above. 8. Defragmentation, including autodefrag.
> > 9. All of the compat_features, including no-holes and skinny-metadata.
> > 
> > Things I consider stable enough that I'm willing to use them on my
> > personal systems but not systems at work:
> > 1. In-line data compression with compress=lzo.  I use this on my
> > laptop and home server system.  I've never had any issues with it
> > myself, but I know that other people have, and it does seem to make
> > other things more likely to have issues.
> > 2. Batch deduplication.  I only use this on the back-end filesystems
> > for my personal storage cluster, and only because I have multiple
> > copies as a result of GlusterFS on top of BTRFS.  I've not had any
> > significant issues with it, and I don't remember any reports of data
> > loss resulting from it, but it's something that people should not be
> > using if they don't understand all the implications.
> 
> I could at least add one "don't do it":
> 
> Don't use BFQ patches (it's an IO scheduler) if you're using btrfs.
> Some people like to use it especially for running VMs and desktops
> because it provides very good interactivity while maintaining very good
> throughput. But it completely destroyed my btrfs beyond repair at least
> twice, either while actually using a VM (in VirtualBox) or during high
> IO loads. I now stick to the deadline scheduler instead which provides
> very good interactivity for me, too, and the corruptions didn't occur
> again so far.
> 
> The story with BFQ has always been the same: System suddenly freezes
> during moderate to high IO until all processes stop working (no process
> shows D state, tho). Only hard reboot possible. After rebooting, access
> to some (unrelated) files may fail with "errno=-17 Object already
> exists" which cannot be repaired. If it affects files needed during
> boot, you are screwed because file system goes RO.

This could be a further row in the table. And well…

as for CFQ Jens Axboe currently works on bandwidth throttling patches 
*exactly* for the reason to provide more interactivity and fairness to I/O 
operations in between.

Right now, Completely Fair in CFQ is a *huge* exaggeration, at least while you 
have a dd bs=1M thing running.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 4.4.0 - no space left with >1.7 TB free space left

2016-09-15 Thread Roman Mamedov

On Fri, 8 Apr 2016 16:53:32 +0500
Roman Mamedov  wrote:

> On Fri, 08 Apr 2016 20:36:26 +0900
> Tomasz Chmielewski  wrote:
> 
> > On 2016-02-08 20:24, Roman Mamedov wrote:
> > 
> > >> Linux 4.4.0 - btrfs is mainly used to host lots of test containers,
> > >> often snapshots, and at times, there is heavy IO in many of them for
> > >> extended periods of time. btrfs is on HDDs.
> > >> 
> > >> 
> > >> Every few days I'm getting "no space left" in a container running 
> > >> mongo
> > >> 3.2.1 database. Interestingly, haven't seen this issue in containers
> > >> with MySQL. All databases have chattr +C set on their directories.
> > > 
> > > Hello,
> > > 
> > > Do you snapshot the parent subvolume which holds the databases? Can you
> > > correlate that perhaps ENOSPC occurs at the time of snapshotting? If 
> > > yes, then
> > > you should try the patch https://patchwork.kernel.org/patch/7967161/
> > > 
> > > (Too bad this was not included into 4.4.1.)
> > 
> > By the way - was it included in any later kernel? I'm running 4.4.5 on 
> > that server, but still hitting the same issue.
> 
> It's not in 4.4.6 either. I don't know why it doesn't get included, or what
> we need to do. Last time I asked, it was queued:
> http://www.spinics.net/lists/linux-btrfs/msg52478.html
> But maybe that meant 4.5 or 4.6 only? While the bug is affecting people on
> 4.4.x today.

This got applied now in 4.4.21, thanks.

-- 
With respect,
Roman


pgpqOFdEk406P.pgp
Description: OpenPGP digital signature

Re: Is stability a joke?

2016-09-15 Thread Martin Steigerwald

Hello Nicholas.

Am Mittwoch, 14. September 2016, 21:05:52 CEST schrieb Nicholas D Steeves:
> On Mon, Sep 12, 2016 at 08:20:20AM -0400, Austin S. Hemmelgarn wrote:
> > On 2016-09-11 09:02, Hugo Mills wrote:
[…]
> > As far as documentation though, we [BTRFS] really do need to get our act
> > together.  It really doesn't look good to have most of the best
> > documentation be in the distro's wikis instead of ours.  I'm not trying to
> > say the distros shouldn't be documenting BTRFS, but the point at which
> > Debian (for example) has better documentation of the upstream version of
> > BTRFS than the upstream project itself does, that starts to look bad.
> 
> I would have loved to have this feature-to-stability list when I
> started working on the Debian documentation!  I started it because I
> was saddened by number of horror story "adventures with btrfs"
> articles and posts I had read about, combined with the perspective of
> certain members within the Debian community that it was a toy fs.
> 
> Are my contributions to that wiki of a high enough quality that I
> can work on the upstream one?  Do you think the broader btrfs
> community is interested in citations and curated links to discussions?
> 
> eg: if a company wants to use btrfs, they check the status page, see a
> feature they want is still in the yellow zone of stabilisation, and
> then follow the links to familiarise themselves with past discussions.
> I imagine this would also help individuals or grad students more
> quickly familiarise themselves with the available literature before
> choosing a specific project.  If regular updates from SUSE, STRATO,
> Facebook, and Fujitsu are also publicly available the k.org wiki would
> be a wonderful place to syndicate them!

 I definately think the quality of your contributions is high enough, others 
can also proofread and give in their experiences, so… By *all* means, go ahead 
*already*.

It doesn´t fit all inside the table directly, I bet, *but* you can use 
footnotes or further explainations regarding features that need them with a 
headline per feature below the table and a link to it from within the table.

Thank you!
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

multi-device btrfs with single data mode and disk failure

2016-09-15 Thread Alexandre Poux

I had a btrfs partition on a 6 disk array without raid (metadata in
raid10, but data in single), and one of the disks just died.

So I lost some of my data, ok, I knew that.

But two question :

  *

Is it possible to know (using metadata I suppose) what data I have
lost ?

  *

Is it possible to do some king of a "btrfs delete missing" on this
kind of setup, in order to recover access in rw to my other data, or
I must copy all my data on a new partition

I haven't been able to get any answer on google or in the wiki, so I
send here an e-mail, hoping that's the right place. Excuse me, if I'm wrong.

Thank you for any help

(Sorry for my poor english)

uname -a :
Linux Grand-PC 4.7.2-1-ARCH #1 SMP PREEMPT Sat Aug 20 23:02:56 CEST 2016
x86_64 GNU/Linux

btrfs --version :
btrfs-progs v4.7.1

btrfs fi show :
Label: 'Data'  uuid: 62db560b-a040-4c64-b613-6e7db033dc4d
Total devices 6 FS bytes used 6.66TiB
devid1 size 2.53TiB used 2.12TiB path /dev/sdd6
devid7 size 2.53TiB used 2.12TiB path /dev/sdb6
devid9 size 262.57GiB used 0.00B path /dev/sde6
devid   11 size 2.53TiB used 2.12TiB path /dev/sdc6
devid   12 size 728.32GiB used 312.03GiB path /dev/sda6
*** Some devices missing


mount -o recovery,ro,degraded /dev/sda6 /Data

relevant part of dmesg :
[ 1828.093704] BTRFS warning (device sda6): 'recovery' is deprecated,
use 'usebackuproot' instead
[ 1828.093708] BTRFS info (device sda6): trying to use backup root at
mount time
[ 1828.093718] BTRFS info (device sda6): allowing degraded mounts
[ 1828.093719] BTRFS info (device sda6): disk space caching is enabled
[ 1828.107763] BTRFS warning (device sda6): devid 8 uuid
950378c0-307c-413d-9805-ab2bb899aa78 missing

btrfs fi df /Data
Data, single: total=6.65TiB, used=6.65TiB
System, RAID1: total=32.00MiB, used=768.00KiB
Metadata, RAID1: total=13.00GiB, used=10.99GiB
GlobalReserve, single: total=512.00MiB, used=0.00B



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is stability a joke?

2016-09-15 Thread Kai Krakow

Am Mon, 12 Sep 2016 08:20:20 -0400
schrieb "Austin S. Hemmelgarn" :

> On 2016-09-11 09:02, Hugo Mills wrote:
> > On Sun, Sep 11, 2016 at 02:39:14PM +0200, Waxhead wrote:  
> >> Martin Steigerwald wrote:  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
> >> That is exactly the same reason I don't edit the wiki myself. I
> >> could of course get it started and hopefully someone will correct
> >> what I write, but I feel that if I start this off I don't have deep
> >> enough knowledge to do a proper start. Perhaps I will change my
> >> mind about this.  
> >
> >Given that nobody else has done it yet, what are the odds that
> > someone else will step up to do it now? I would say that you should
> > at least try. Yes, you don't have as much knowledge as some others,
> > but if you keep working at it, you'll gain that knowledge. Yes,
> > you'll probably get it wrong to start with, but you probably won't
> > get it *very* wrong. You'll probably get it horribly wrong at some
> > point, but even the more knowledgable people you're deferring to
> > didn't identify the problems with parity RAID until Zygo and Austin
> > and Chris (and others) put in the work to pin down the exact
> > issues.  
> FWIW, here's a list of what I personally consider stable (as in, I'm 
> willing to bet against reduced uptime to use this stuff on production 
> systems at work and personal systems at home):
> 1. Single device mode, including DUP data profiles on single device 
> without mixed-bg.
> 2. Multi-device raid0, raid1, and raid10 profiles with symmetrical 
> devices (all devices are the same size).
> 3. Multi-device single profiles with asymmetrical devices.
> 4. Small numbers (max double digit) of snapshots, taken at infrequent 
> intervals (no more than once an hour).  I use single snapshots
> regularly to get stable images of the filesystem for backups, and I
> keep hourly ones of my home directory for about 48 hours.
> 5. Subvolumes used to isolate parts of a filesystem from snapshots.
> I use this regularly to isolate areas of my filesystems from backups.
> 6. Non-incremental send/receive (no clone source, no parent's, no 
> deduplication).  I use this regularly for cloning virtual machines.
> 7. Checksumming and scrubs using any of the profiles I've listed
> above. 8. Defragmentation, including autodefrag.
> 9. All of the compat_features, including no-holes and skinny-metadata.
> 
> Things I consider stable enough that I'm willing to use them on my 
> personal systems but not systems at work:
> 1. In-line data compression with compress=lzo.  I use this on my
> laptop and home server system.  I've never had any issues with it
> myself, but I know that other people have, and it does seem to make
> other things more likely to have issues.
> 2. Batch deduplication.  I only use this on the back-end filesystems
> for my personal storage cluster, and only because I have multiple
> copies as a result of GlusterFS on top of BTRFS.  I've not had any
> significant issues with it, and I don't remember any reports of data
> loss resulting from it, but it's something that people should not be
> using if they don't understand all the implications.

I could at least add one "don't do it":

Don't use BFQ patches (it's an IO scheduler) if you're using btrfs.
Some people like to use it especially for running VMs and desktops
because it provides very good interactivity while maintaining very good
throughput. But it completely destroyed my btrfs beyond repair at least
twice, either while actually using a VM (in VirtualBox) or during high
IO loads. I now stick to the deadline scheduler instead which provides
very good interactivity for me, too, and the corruptions didn't occur
again so far.

The story with BFQ has always been the same: System suddenly freezes
during moderate to high IO until all processes stop working (no process
shows D state, tho). Only hard reboot possible. After rebooting, access
to some (unrelated) files may fail with "errno=-17 Object already
exists" which cannot be repaired. If it affects files needed during
boot, you are screwed because file system goes RO.

-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

47 matches

Mail list logo