from:"ronnie sahlberg"

Re: What if TRIM issued a wipe on devices that don't TRIM?

2018-12-06 Thread ronnie sahlberg

Hi,

I am more of a SCSI guy than ATA so forgive where I am ignorant.

The SCSI equivalent to TRIM is called UNMAP.
UNMAP is unfortunately only a "hint" to the device so if the device
for any reason
is busy,   it can just do a NO-OP, leave the data as is and still
return status SUCCESS.
That is not good if you want to wipe data for confidentiality reasons :-)

As UNMAP and TRIM are related make sure that TRIM actually provides a
guarantee to wipe the
data. I do not know ATA enough to know if it does or not.

In SCSI, instead of using UNMAP/TRIM  you can use WRITESAME10/16 which
can be used
and which does providde an overwrite/wipe guarantee. Maybe ATA has
something equivalent to
WRITESAME10/16.

I just want to say: be careful, sometimes these commands do not
provide a guarantee that they will actually
make the data overwritten/unretrievable.

ronnie sahlberg

On Thu, Dec 6, 2018 at 4:24 PM Robert White  wrote:
>
> (1) Automatic and selective wiping of unused and previously used disk
> blocks is a good security measure, particularly when there is an
> encryption layer beneath the file system.
>
> (2) USB attached devices _never_ support TRIM and they are the most
> likely to fall into strangers hands.
>
> (3) I vaguely recall that some flash chips will take bulk writhes of
> full sectors of 0x00 or 0xFF (I don't remember which) were second-best
> to TRIM for letting the flash controllers defragment their internals.
>
> So it would be dog-slow, but it would be neat if BTRFS had a mount
> option to convert any TRIM command from above into the write of a zero,
> 0xFF, or trash block to the device below if that device doesn't support
> TRIM. Real TRIM support would override the block write.
>
> Obviously doing an fstrim would involve a lot of slow device writes but
> only for people likely to do that sort of thing.
>
> For testing purposes the destruction of unused pages in this manner
> might catch file system failures or coding errors.
>
> (The other layer where this might be most appropriate is in cryptsetup
> et al, where it could lie about TRIM support, but that sort of stealth
> lag might be bad for filesystem-level operations. Doing it there would
> also loose the simpler USB use cases.)
>
> ...Just a thought...
>
> --Rob White.
>
>

Re: Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS and MDADM (Dec 2014) – Ronny Egners Blog

2017-11-02 Thread ronnie sahlberg

I think it is just a matter of lack of resources.
The very few paid resources to work on btrfs probably does not have
priority to work on parity raid.
(And honestly, parity raid is probably much better implemented below
the filesystem in any case, i.e. in say the md driver or the array
itself).

Also, until at least about a year ago, RAID56 was known to be
completely broken in btrfs and would destroy all your data.
Not a question of when, but if.

So, considering the state of parity raid in btrfs it is understandable
if the few resources available would not work on Andrea's 6 parity
raid code.
I don't follow the parity raid code in btrfs closely, it might be
fixed by now or it might still be pathologically broken. I don't know.
I assume it is still deadly to use btrfs raid5/6.

That said, that the MDADM folks did not pick up on Andrea's work is a tragedy.
While it is really just Reed-Solomon coding, his breakthrough was that
he found a 6 parity Reed-Solomon encoding  where the first two
parities
were identical to the RAID5/6 parities.
I.e. you could add a third parity to a normal RAID6 and thus create a
3-parity system without having to recompute the first and second
parity.

On Thu, Nov 2, 2017 at 12:45 PM, Dave  wrote:
> Has this been discussed here? Has anything changed since it was written?
>
> Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS
> and MDADM (Dec 2014) – Ronny Egners Blog
> http://blog.ronnyegner-consulting.de/2014/12/10/parity-based-redundancy-raid56triple-parity-and-beyond-on-btrfs-and-mdadm-dec-2014/comment-page-1/
>
> TL;DR: There are patches to extend the linux kernel to support up to 6
> parity disks but BTRFS does not want them because it does not fit
> their “business case” and MDADM would want them but somebody needs to
> develop patches for the MDADM component. The kernel raid
> implementation is ready and usable. If someone volunteers to do this
> kind of work I would support with equipment and myself as a test
> resource.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Encountered kernel bug#72811. Advice on recovery?

2017-04-14 Thread ronnie sahlberg

On Thu, Apr 13, 2017 at 8:47 PM, Duncan <1i5t5.dun...@cox.net> wrote:
> Ank Ular posted on Thu, 13 Apr 2017 14:49:41 -0400 as excerpted:
...
> OK, I'm one of the ones that's going to "go off" on you, but FWIW, I
> expect pretty much everyone else would pretty much agree.  At least you
> do have backups. =:^)
>
> I don't think you appreciate just how bad raid56 is ATM.  There are just
> too many REALLY serious bugs like the one you mention with it, and it's
> actively NEGATIVELY recommended here as a result.  It's bad enough with
> even current kernels, and the problems are well known enough to the devs,
> that there's really not a whole lot to test ATM...

Can we please hide the ability to even create any new raid56
filesystems behind a new flag :

--i-accept-total-data-loss

to make sure that folks are prepared for how risky it currently is.
That should be an easy patch to the userland utilities.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is it possible to speed up unlink()?

2016-10-20 Thread ronnie sahlberg

On Thu, Oct 20, 2016 at 7:44 AM, Austin S. Hemmelgarn
 wrote:
> On 2016-10-20 09:47, Timofey Titovets wrote:
>>
>> 2016-10-20 15:09 GMT+03:00 Austin S. Hemmelgarn :
>>>
>>> On 2016-10-20 05:29, Timofey Titovets wrote:


 Hi, i use btrfs for NFS VM replica storage and for NFS shared VM
 storage.
 At now i have a small problem what VM image deletion took to long time
 and NFS client show a timeout on deletion
 (ESXi Storage migration as example).

 Kernel: Linux nfs05 4.7.0-0.bpo.1-amd64 #1 SMP Debian 4.7.5-1~bpo8+2
 (2016-10-01) x86_64 GNU/Linux
 Mount options: noatime,compress-force=zlib,space_cache,commit=180
 Feature enabled:
 big_metadata:1
 compress_lzo:1
 extended_iref:1
 mixed_backref:1
 no_holes:1
 skinny_metadata:1

 AFAIK, unlink() return only when all references to all extents from
 unlinked inode will be deleted
 So with compression enabled files have a many many refs to each
 compressed chunk.
 So, it's possible to return unlink() early? or this a bad idea(and why)?
>>>
>>>
>>> I may be completely off about this, but I could have sworn that unlink()
>>> returns when enough info is on the disk that both:
>>> 1. The file isn't actually visible in the directory.
>>> 2. If the system crashes, the filesystem will know to finish the cleanup.
>>>
>>> Out of curiosity, what are the mount options (and export options) for the
>>> NFS share?  I have a feeling that that's also contributing.  In
>>> particular,
>>> if you're on a reliable network, forcing UDP for mounting can
>>> significantly
>>> help performance, and if your server is reliable, you can set NFS to run
>>> asynchronously to make unlink() return almost immediately.
>>
>>
>>
>> For NFS export i use:
>> rw,no_root_squash,async,no_subtree_check,fsid=1
>> AFAIK ESXi don't support nfs with udp
>
> That doesn't surprise me.  If there's any chance of packet loss, then NFS
> over UDP risks data corruption, so a lot of 'professional' software only
> supports NFS over TCP.  The thing is though, in a vast majority of networks
> ESXi would be running in, there's functionally zero chance of packet loss
> unless there's a hardware failure.
>>
>> And you right on normal Linux client async work pretty good and
>> deletion of big file are pretty fast (but also it's can lock nfsd on
>> nfs server for long time, while he do unlink()).
>
> You might also try with NFS-Ganesha instead of the Linux kernel NFS server.
> It scales a whole lot better and tends to be a bit smarter, so it might help
> (especially since it gives better NFS over TCP performance than the kernel
> server too).  The only significant downside is that it's somewhat lacking in
> good documentation.

He is using NFS and removing a single file.
This involves only two packets to be exchanged between client and server
-> NFSv3 REMOVE resquest
and
<- NFSv3 REMOTE reply

These packets are both < 100 bytes in size.
On the server side, both knfsd.ko as well as Ganesha both pretty much just
calls unlink() for this request.

This looks like a pure BTRFS issue and I can not see how kngsd vs
ganesha or tcp vs udp can help.
Traditional nfs clients allow to tweak for impossibly slow servers,
for example using the 'timeo' client mount
option.


Maybe ESXi has a similar option to make it more tolerant to "when the
server does not respond within
reasonable timeout so we might need to consider the server dead and
return EIO to the application."




>>
>>
>>> Now, on top of that, you should probably look at adding 'lazytime' to the
>>> mount options for BTRFS.  This will cause updates to file time-stamps
>>> (not
>>> just atime, but mtime also, it has no net effect on ctime though, because
>>> a
>>> ctime update means something else in the inode got updated) to be
>>> deferred
>>> up to 24 hours or until the next time the inode would be written out,
>>> which
>>> can significantly improve performance on BTRFS because of the
>>> write-amplification.  It's not hugely likely to improve performance for
>>> unlink(), but it should improve write performance some, which may help in
>>> general.
>>
>>
>> Thanks for lazytime i forgot about it %)
>> On my debian servers i can't apply it with error:
>> BTRFS info (device sdc1): unrecognized mount option 'lazytime'
>> But successful apply it to my arch box (Linux 4.8.2)
>
> That's odd, 4.7 kernels definitely have support for it (I've been using it
> since 4.7.0 on all my systems, but I build upstream kernels).
>>
>>
>> For fast unlink(), i just think about subvolume like behaviour, then
>> it's possible to fast delete subvolume (without commit) and then
>> kernel will clean data in the background.
>
> There's two other possibilities I can think of to improve this.  One is
> putting each VM image in it's own subvolume, but that then means you almost
> certainly can't use ESXi to delete the images directly, although it will
> likely get

Re: RAID system with adaption to changed number of disks

2016-10-11 Thread ronnie sahlberg

On Tue, Oct 11, 2016 at 8:14 AM, Philip Louis Moetteli
 wrote:
>
> Hello,
>
>
> I have to build a RAID 6 with the following 3 requirements:


You should under no circumstances use RAID5/6 for anything other than
test and throw-away data.
It has several known issues that will eat your data. Total data loss
is a real possibility.

(the capability to even create raid5/6 filesystems should imho be
removed from btrfs until this changes.)

>
> • Use different kinds of disks with different sizes.
> • When a disk fails and there's enough space, the RAID should be able 
> to reconstruct itself out of the degraded state. Meaning, if I have e. g. a 
> RAID with 8 disks and 1 fails, I should be able to chose to transform this in 
> a non-degraded (!) RAID with 7 disks.
> • Also the other way round: If I add a disk of what size ever, it 
> should redistribute the data, so that it becomes a RAID with 9 disks.
>
> I don’t care, if I have to do it manually.
> I don’t care so much about speed either.
>
> Is BTrFS capable of doing that?
>
>
> Thanks a lot for your help!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-06-26 Thread ronnie sahlberg

On Sat, Jun 25, 2016 at 7:53 PM, Duncan <1i5t5.dun...@cox.net> wrote:
> Chris Murphy posted on Sat, 25 Jun 2016 11:25:05 -0600 as excerpted:
>
>> Wow. So it sees the data strip corruption, uses good parity on disk to
>> fix it, writes the fix to disk, recomputes parity for some reason but
>> does it wrongly, and then overwrites good parity with bad parity?
>> That's fucked. So in other words, if there are any errors fixed up
>> during a scrub, you should do a 2nd scrub. The first scrub should make
>> sure data is correct, and the 2nd scrub should make sure the bug is
>> papered over by computing correct parity and replacing the bad parity.
>>
>> I wonder if the same problem happens with balance or if this is just a
>> bug in scrub code?
>
> Could this explain why people have been reporting so many raid56 mode
> cases of btrfs replacing a first drive appearing to succeed just fine,
> but then they go to btrfs replace a second drive, and the array crashes
> as if the first replace didn't work correctly after all, resulting in two
> bad devices once the second replace gets under way, of course bringing
> down the array?
>
> If so, then it looks like we have our answer as to what has been going
> wrong that has been so hard to properly trace and thus to bugfix.
>
> Combine that with the raid4 dedicated parity device behavior you're
> seeing if the writes are all exactly 128 MB, with that possibly
> explaining the super-slow replaces, and this thread may have just given
> us answers to both of those until-now-untraceable issues.
>
> Regardless, what's /very/ clear by now is that raid56 mode as it
> currently exists is more or less fatally flawed, and a full scrap and
> rewrite to an entirely different raid56 mode on-disk format may be
> necessary to fix it.
>
> And what's even clearer is that people /really/ shouldn't be using raid56
> mode for anything but testing with throw-away data, at this point.
> Anything else is simply irresponsible.
>
> Does that mean we need to put a "raid56 mode may eat your babies" level
> warning in the manpage and require a --force to either mkfs.btrfs or
> balance to raid56 mode?  Because that's about where I am on it.

Agree. At this point letting ordinary users create raid56 filesystems
is counterproductive.


I would suggest:

1, a much more strongly worded warning in the wiki. Make sure there
are no misunderstandings
that they really should not use raid56 right now for new filesystems.

2, Instead of a --force flag. (Users tend to ignore ---force and
warnings in documentation.)
Instead ifdef out the options to create raid56 in mkfs.btrfs.
Developers who want to test can just remove the ifdef and recompile
the tools anyway.
But if end-users have to recompile userspace, that really forces the
point that "you
really should not use this right now".

3, reach out to the documentation and fora for the major distros and
make sure they update their
documentation accordingly.
I think a lot of end-users, if they try to research something, are
more likely to go to  fora and wiki
than search out an upstream fora.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Trying to rescue my data :(

2016-06-24 Thread ronnie sahlberg

What I would do in this situation :

1, Immediately stop writing to these disks/filesystem. ONLY access it
in read-only mode until you have salvaged what can be salvaged.
2, get a new 5T UDB drive (they are cheap) and copy file by file off the array.
3, when you hit files that cause panics, make a node of the inode and
avoid touching that file again.

Will likely take a lot of work and time since I suspect it is a
largely manual process. But if the data is important ...


Once you have all salvageable data copied to the new drive you can
decide on how to proceed.
I.e. if you want to try to repair the filesystem (I have low
confidence in this for parity raid case) or if you will simply rebuild
a new fs from scratch.

On Fri, Jun 24, 2016 at 9:26 AM, Steven Haigh  wrote:
> On 25/06/16 00:52, Steven Haigh wrote:
>> Ok, so I figured that despite what the BTRFS wiki seems to imply, the
>> 'multi parity' support just isn't stable enough to be used. So, I'm
>> trying to revert to what I had before.
>>
>> My setup consist of:
>>   * 2 x 3Tb drives +
>>   * 3 x 2Tb drives.
>>
>> I've got (had?) about 4.9Tb of data.
>>
>> My idea was to convert the existing setup using a balance to a 'single'
>> setup, delete the 3 x 2Tb drives from the BTRFS system, then create a
>> new mdadm based RAID6 (5 drives degraded to 3), create a new filesystem
>> on that, then copy the data across.
>>
>> So, great - first the balance:
>> $ btrfs balance start -dconvert=single -mconvert=single -f (yes, I know
>> it'll reduce the metadata redundancy).
>>
>> This promptly was followed by a system crash.
>>
>> After a reboot, I can no longer mount the BTRFS in read-write:
>> [  134.768908] BTRFS info (device xvdd): disk space caching is enabled
>> [  134.769032] BTRFS: has skinny extents
>> [  134.769856] BTRFS: failed to read the system array on xvdd
>> [  134.776055] BTRFS: open_ctree failed
>> [  143.900055] BTRFS info (device xvdd): allowing degraded mounts
>> [  143.900152] BTRFS info (device xvdd): not using ssd allocation scheme
>> [  143.900243] BTRFS info (device xvdd): disk space caching is enabled
>> [  143.900330] BTRFS: has skinny extents
>> [  143.901860] BTRFS warning (device xvdd): devid 4 uuid
>> 61ccce61-9787-453e-b793-1b86f8015ee1 is missing
>> [  146.539467] BTRFS: missing devices(1) exceeds the limit(0), writeable
>> mount is not allowed
>> [  146.552051] BTRFS: open_ctree failed
>>
>> I can mount it read only - but then I also get crashes when it seems to
>> hit a read error:
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064
>> csum 3245290974 wanted 982056704 mirror 0
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 390821102 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 550556475 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 1279883714 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 2566472073 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 1876236691 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 3350537857 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 3319706190 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 2377458007 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 2066127208 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 657140479 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 1239359620 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 1598877324 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 1082738394 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 371906697 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 2156787247 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 309399 wanted 982056704 mirror 1
>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>> 180814340 wanted 982056704 mirror 1
>> [ cut here ]
>> kernel BUG at fs/btrfs/extent_io.c:2401!
>> invalid opcode:  [#1] SMP
>> Modules linked in: btrfs x86_pkg_temp_thermal coretemp crct10dif_pclmul
>> xor aesni_intel aes_x86_64 lrw gf128mul glue_helper pcspkr raid6_pq
>> ablk_helper cryptd nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables
>> xen_netfront crc32c_intel xen_gntalloc xen_evtchn ipv6 autofs4
>> BTRFS info

Re: btrfs fail behavior when a device vanishes

2015-12-31 Thread ronnie sahlberg

Here is a kludge I hacked up.
Someone that cares could clean this up and start building a proper
test suite or something.

This test script creates a 3 disk raid1 filesystem and very slowly
writes a large file onto the filesystem while, one by one each disk is
disconnected then reconnected in a loop.
It is fairly trivial to trigger dataloss when devices are bounced like this.

You have to run the script as root due to the calls to [u]mount and iscsiadm




On Thu, Dec 31, 2015 at 1:23 PM, ronnie sahlberg
<ronniesahlb...@gmail.com> wrote:
> On Thu, Dec 31, 2015 at 12:11 PM, Chris Murphy <li...@colorremedies.com> 
> wrote:
>> This is a torture test, no data is at risk.
>>
>> Two devices, btrfs raid1 with some stuff on them.
>> Copy from that array, elsewhere.
>> During copy, yank the active device.
>>
>> dmesg shows many of these:
>>
>> [ 7179.373245] BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr
>> 652123, rd 697237, flush 0, corrupt 0, gen 0
>
> For automated tests a good way could be to build a multi device btrfs 
> filesystem
> ontop of it.
> For example STGT exporting n# volumes and then mount via the loopback 
> interface.
> Then you could just use tgtadm to add / remove the device in a
> controlled fashion and to any filesystem it will look exactly like if
> you pulled the device physically.
>
> This allows you to run fully automated and scripted "how long before
> the filesystem goes into total dataloss mode" tests.
>
>
>
> If you want more fine control than just plug/unplug on a live
> filesystem , you can use
> https://github.com/rsahlberg/flaky-stgt
> Again, this uses iSCSI but it allows you to script event such as
> "this range of blocks are now Uncorrectable read error" etc.
> To automatically stress test that the filesystem can deal with it.
>
>
> I created this STGT fork so that filesystem testers would have a way
> to automate testing of their failure paths.
> In particular for BTRFS which seems to still be incredible fragile
> when devices fail or disconnect.
>
> Unfortunately I don't think anyone cared very much. :-(
> Please BTRFS devs,  please use something like this for testing of
> failure modes and robustness. Please!
>
>
>
>>
>> Why are the write errors nearly as high as the read errors, when there
>> is only a copy from this device happening?
>>
>> Is Btrfs trying to write the read error count (for dev stats) of sdc1
>> onto sdc1, and that causes a write error?
>>
>> Also, is there a command to make a block device go away? At least in
>> gnome shell when I eject a USB stick, it isn't just umounted, it no
>> longer appears with lsblk or blkid, so I'm wondering if there's a way
>> to vanish a misbehaving device so that Btrfs isn't bogged down with a
>> flood of retries.
>>
>> In case anyone is curious, the entire dmesg from device insertion,
>> formatting, mounting, copying to then from, and device yanking is here
>> (should be permanent):
>> http://pastebin.com/raw/Wfe1pY4N
>>
>> And the copy did successfully complete anyway, and the resulting files
>> have the same hashes as their originals. So, yay, despite the noisy
>> messages.
>>
>>
>> --
>> Chris Murphy
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


test_0100_write_raid1_unplug.sh
Description: Bourne shell script


functions.sh
Description: Bourne shell script

Re: btrfs fail behavior when a device vanishes

2015-12-31 Thread ronnie sahlberg

On Thu, Dec 31, 2015 at 12:11 PM, Chris Murphy  wrote:
> This is a torture test, no data is at risk.
>
> Two devices, btrfs raid1 with some stuff on them.
> Copy from that array, elsewhere.
> During copy, yank the active device.
>
> dmesg shows many of these:
>
> [ 7179.373245] BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr
> 652123, rd 697237, flush 0, corrupt 0, gen 0

For automated tests a good way could be to build a multi device btrfs filesystem
ontop of it.
For example STGT exporting n# volumes and then mount via the loopback interface.
Then you could just use tgtadm to add / remove the device in a
controlled fashion and to any filesystem it will look exactly like if
you pulled the device physically.

This allows you to run fully automated and scripted "how long before
the filesystem goes into total dataloss mode" tests.

If you want more fine control than just plug/unplug on a live
filesystem , you can use
https://github.com/rsahlberg/flaky-stgt
Again, this uses iSCSI but it allows you to script event such as
"this range of blocks are now Uncorrectable read error" etc.
To automatically stress test that the filesystem can deal with it.

I created this STGT fork so that filesystem testers would have a way
to automate testing of their failure paths.
In particular for BTRFS which seems to still be incredible fragile
when devices fail or disconnect.

Unfortunately I don't think anyone cared very much. :-(
Please BTRFS devs,  please use something like this for testing of
failure modes and robustness. Please!

>
> Why are the write errors nearly as high as the read errors, when there
> is only a copy from this device happening?
>
> Is Btrfs trying to write the read error count (for dev stats) of sdc1
> onto sdc1, and that causes a write error?
>
> Also, is there a command to make a block device go away? At least in
> gnome shell when I eject a USB stick, it isn't just umounted, it no
> longer appears with lsblk or blkid, so I'm wondering if there's a way
> to vanish a misbehaving device so that Btrfs isn't bogged down with a
> flood of retries.
>
> In case anyone is curious, the entire dmesg from device insertion,
> formatting, mounting, copying to then from, and device yanking is here
> (should be permanent):
> http://pastebin.com/raw/Wfe1pY4N
>
> And the copy did successfully complete anyway, and the resulting files
> have the same hashes as their originals. So, yay, despite the noisy
> messages.
>
>
> --
> Chris Murphy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs fail behavior when a device vanishes

2015-12-31 Thread ronnie sahlberg

On Thu, Dec 31, 2015 at 5:27 PM, Chris Murphy <li...@colorremedies.com> wrote:
> On Thu, Dec 31, 2015 at 6:09 PM, ronnie sahlberg
> <ronniesahlb...@gmail.com> wrote:
>> Here is a kludge I hacked up.
>> Someone that cares could clean this up and start building a proper
>> test suite or something.
>>
>> This test script creates a 3 disk raid1 filesystem and very slowly
>> writes a large file onto the filesystem while, one by one each disk is
>> disconnected then reconnected in a loop.
>> It is fairly trivial to trigger dataloss when devices are bounced like this.
>
> Yes, it's quite a torture test. I'd expect this would be a problem for
> Btrfs until this feature is done at least:
>
> https://btrfs.wiki.kernel.org/index.php/Project_ideas#Take_device_with_heavy_IO_errors_offline_or_mark_as_.22unreliable.22
>
> And maybe this one too
> https://btrfs.wiki.kernel.org/index.php/Project_ideas#False_alarm_on_bad_disk_-_rebuild_mitigation
>
> Already we know that Btrfs tries to write indefinitely to missing
> devices. If it reappears, what gets written? Will that device be
> consistent? And then another one goes missing, comes back, now
> possibly two devices with totally different states for identical
> generations. It's a mess. We know that trivially causes major
> corruption with btrfs raid1 if a user mounts e.g. devid1 rw,degraded
> modifies that; then mounts devid2 (only) rw,degraded and modifies it;
> and then mounts both devids together. Kablewy. Big mess. And that's
> umounting each one in between those steps; not even the abrupt
> disconnect/reconnect.

Based on my test_0100... create a test script for that scenario too.
Even if btrfs can not handle it yet,  it does not hurt to have these
tests for scenarios that MUST work before the filesystem go officially
"stable+production".
Having these tests will possibly even make the work to close the
robustness gap easier since the devs will have reproducible test
scripts they can validate new features against.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs fail behavior when a device vanishes

2015-12-31 Thread ronnie sahlberg

On Thu, Dec 31, 2015 at 5:27 PM, Chris Murphy <li...@colorremedies.com> wrote:
> On Thu, Dec 31, 2015 at 6:09 PM, ronnie sahlberg
> <ronniesahlb...@gmail.com> wrote:
>> Here is a kludge I hacked up.
>> Someone that cares could clean this up and start building a proper
>> test suite or something.
>>
>> This test script creates a 3 disk raid1 filesystem and very slowly
>> writes a large file onto the filesystem while, one by one each disk is
>> disconnected then reconnected in a loop.
>> It is fairly trivial to trigger dataloss when devices are bounced like this.
>
> Yes, it's quite a torture test. I'd expect this would be a problem for
> Btrfs until this feature is done at least:
>
> https://btrfs.wiki.kernel.org/index.php/Project_ideas#Take_device_with_heavy_IO_errors_offline_or_mark_as_.22unreliable.22
>
> And maybe this one too
> https://btrfs.wiki.kernel.org/index.php/Project_ideas#False_alarm_on_bad_disk_-_rebuild_mitigation
>
> Already we know that Btrfs tries to write indefinitely to missing
> devices.

Another question is how it handles writes when the mirrorset becomes
degraded that way.
I would expect it would :
* immediately emergency destage any dirty data in the write cache to
the surviving member disks.
* switch any future I/O to that mirrorset to use ordered and
synchronous writes to the surviving members.

> If it reappears, what gets written? Will that device be
> consistent? And then another one goes missing, comes back, now
> possibly two devices with totally different states for identical
> generations. It's a mess. We know that trivially causes major
> corruption with btrfs raid1 if a user mounts e.g. devid1 rw,degraded
> modifies that; then mounts devid2 (only) rw,degraded and modifies it;
> and then mounts both devids together. Kablewy. Big mess. And that's
> umounting each one in between those steps; not even the abrupt
> disconnect/reconnect.
>
>
> --
> Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Timeouts copying large files to a Samba server with Btrfs

2015-12-26 Thread ronnie sahlberg

That does not look good.
See if you can find something in the samba logs on the server.
Look for messages about long running VFS operations and/or client
disconnecting wile a file is open for writing.

The CIFS/SMB protocol has hard real-time requirements in the windows
client redirector which leads to dataloss if a server becomes
unresponsive for a long time.
Long time here means ~20s or more.

The reason is that for performance reasons CIFS/SMB defaults to use
clientside caching for writes (using oplocks as the cache coherency
protocol).
IF a server suddenly stops responding promptly the client will
eventually (20-60 seconds) tear down the connection and reconnect. As
part of the session teardown, any open files will be forced close, and
any write cache on the client will be discarded.

This basically means that if a server gets stuck in the VFS for a slow
filesystem, you face a real risk that any/all files that are open for
writing will be truncated at that stage and you have data loss.

This used to be a big problem when using samba ontop of various
cluster filesystems since they used to have a tendency to pause all
I/O for sometimes very long times when the cluster topology changed,
leading to a large amount of dataloss every time.
We added some logging to samba to help identiify this and also to log
all the names of the files that were very likely destroyed, but I
can't recall the exact wording of these messages of the top of my
head.
Look in the samba logs for things that relate to long running VFS
operations or client disconnect while the file is open for write.

Basically, If you want to use a filesystem host CIFS, you must
instrument it so that it will guarantee to always respond to I/O
requests from the clients within 10 seconds (to have some headroom) or
else you will face a real risk of data loss.

If you can not guarantee that the filesystem will never pause for this
long because it is doing foo/bar/bob/...   then you should not use
that filesystem for samba.

On Sat, Dec 19, 2015 at 1:50 PM, Roman Mamedov  wrote:
> Hello,
>
> Sometimes when I copy large files (the latest case was with a 13 GB file) to a
> Btrfs-residing share on a Samba file server (using Thunar file manager), the
> copy process fails around the end with following messages in dmesg on the
> client:
>
> [7699154.504380] CIFS VFS: sends on sock 88010d41e800 stuck for 15 seconds
> [7699154.504440] CIFS VFS: Error -11 sending data on socket to server
> [7699215.173469] CIFS VFS: sends on sock 88010d41e800 stuck for 15 seconds
> [7699215.173533] CIFS VFS: Error -11 sending data on socket to server
> [7699317.982262] CIFS VFS: sends on sock 88010d41e800 stuck for 15 seconds
> [7699317.982319] CIFS VFS: Error -11 sending data on socket to server
>
> Nothing in dmesg on the server.
>
> My guess is that the Samba server process submits too much queued buffers at
> once to be written to disk, then blocks on waiting for this, and the whole
> operation ends up taking so long, that it doesn't get back to the client in
> time.
>
> This also happens much more often is compress-force is enabled on the server.
>
> The server specs are AMD E-350 1.6GHz, 16GB of RAM, client/server network
> connection is 1 Gbit. Kernel 4.1.15 on the server, 3.18.21 on the client.
>
> Any idea what to tune so that this doesn't happen? 
> (server/client/Samba/Btrfs?)
>
> --
> With respect,
> Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Btrfs/RAID5 became unmountable after SATA cable fault

2015-10-21 Thread ronnie sahlberg

If it is for mostly archival storage, I would suggest you take a look
at snapraid.


On Wed, Oct 21, 2015 at 9:09 AM, Janos Toth F.  wrote:
> I went through all the recovery options I could find (starting from
> read-only to "extraordinarily dangerous"). Nothing seemed to work.
>
> A Windows based proprietary recovery software (ReclaiMe) could scratch
> the surface but only that (it showed me the whole original folder
> structure after a few minutes of scanning and the "preview" of some
> some plaintext files was promising but most of the bigger files seemed
> to be broken).
>
> I used this as a bulk storage for backups and all the things I didn't
> care to keep in more than one copies but that includes my
> "scratchpad", so I cared enough to use RAID5 mode and to try restoring
> some things.
>
> Any last ideas before I "ata secure erase" and sell/repurpose the disks?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Btrfs/RAID5 became unmountable after SATA cable fault

2015-10-21 Thread ronnie sahlberg

Maybe hold off erasing the drives a little in case someone wants to
collect some extra data for diagnosing how/why the filesystem got into
this unrecoverable state.

A single device having issues should not cause the whole filesystem to
become unrecoverable.

On Wed, Oct 21, 2015 at 9:09 AM, Janos Toth F.  wrote:
> I went through all the recovery options I could find (starting from
> read-only to "extraordinarily dangerous"). Nothing seemed to work.
>
> A Windows based proprietary recovery software (ReclaiMe) could scratch
> the surface but only that (it showed me the whole original folder
> structure after a few minutes of scanning and the "preview" of some
> some plaintext files was promising but most of the bigger files seemed
> to be broken).
>
> I used this as a bulk storage for backups and all the things I didn't
> care to keep in more than one copies but that includes my
> "scratchpad", so I cared enough to use RAID5 mode and to try restoring
> some things.
>
> Any last ideas before I "ata secure erase" and sell/repurpose the disks?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Can BTRFS handle XATTRs larger than 4K?

2014-12-22 Thread ronnie sahlberg

On Mon, Dec 22, 2014 at 2:52 PM, Robert White rwh...@pobox.com wrote:
 On 12/22/2014 12:44 PM, Austin S Hemmelgarn wrote:

 On 2014-12-22 15:06, Richard Sharpe wrote:

 On Mon, Dec 22, 2014 at 10:43 AM, Chris Murphy
 li...@colorremedies.com wrote:

 On Mon, Dec 22, 2014 at 11:09 AM, Austin S Hemmelgarn
 ahferro...@gmail.com wrote:

 Personally, I'd love to see unlimited length xattr's like NTFS and
 HFS+ do,
 as that would greatly improve interoperability (both Windows and OS
 X use
 xattrs, although they call them 'alternative data streams' and 'forks'
 respectively), and provide a higher likelihood that xattrs would start
 getting used more.


 This is two years old, but it looks like NFS will not support xattr.
 http://comments.gmane.org/gmane.linux.nfs/53259

 It looks like SMB does support xattr (and sometimes requires it) but I
 have no idea to what degree, including the host/client preservation on
 different filesystems. [1] It would still be helpful for cp and rsync
 to be able to preserve xattr, however Apple has moved to a new on-disk
 format that makes the future of reading OS X volumes on Linux an open
 question. [2]


 Those are the old OS 2 XATTRs, better known as EAs, and NTFS says that
 you can support EAs or you can have Reparse Points, but not both
 (basically, they re-used the EA Length field as the reparse tag).
 Also, Windows (of any flavor) does not make it easy to access EAs.

 But OS/2 style XATTRS are not the same as NTFS Alternative Data Streams,
 which technically (because of Windows backward compatibility interfaces)
 don't need a huge amount of support from SMB.  They were originally
 added to support SFM in NT3.1, so that windows could store resource
 forks.  The two primary uses on windows today are the file history
 interface in Win8/8.1 and the 'zone_identifier' saved with downloads by
 most modern browsers.  They're actually pretty easy to get to, you just
 append the ADS name to the end of the filename with a : separating them,
 and you can access it like a regular file (which is part of why : isn't
 a legal character in a windows filename).  Most people don't know about
 them because they don't get listed in windows explorer, even with hidden
 files and protected OS file visible.  The actual on-disk format for them
 is actually kind of interesting, the file data itself (what in the Apple
 world is called the Data Fork) is actually stored as an unnamed ADS
 associated with the filename.


 My stupid two cents...

 Wouldn't keeping a file history be better done with something git-like
 (monotonish? 8-) combined with an incron type file-watcher?

 So like a small xattr to link the file to the repository or something...

 setfattr --name=user.history_repo --value=/path/to/repository file

 and some not-in-the-kernel subsystems?

Atomicity.
NTFS/ADS snapshots are created atomically.

There is of course nothing that prevents the separate file +
setfattr the path to the file
from working, but it is inconvenient and somewhat ugly since you have
to implement your own
transaction and rollback mechanism every time :-(

What about if someone else modifies/renames/deletes '/path/to/repository file'
Can you prevent that? If not, how do you detect this and handle it
when 'user.history_repo'
no longer points to the right data?
You have to write code to handle it somehow.

It is a lot more convenient when this is a first class filesystem api
since you don't have to deal with it. The guy writing the filesystem
dealt with it for you.



Second really big benefit with the ADS approach is that once it is a
first class API
you don't need to rewrite all third party apps to understand the
mapping 'user.history_repo' = '/path/to/repository file'
If it is a standard filesystem feature, then you just update 'tar',
'rsync', 'scp', 'cp', 'mv', ...
once and it will work. You don;t need to re-implement this API in
these tools everytime
a new mapping is invented.

I.e. by having it as a standard API for the filesystem, you update the
external tools once and it will work reliably always,a cross all
filesystems (that support it).




 What is the practical use case for really large XATTRS that isn't solved by
 indirection to non-kernel facilities.

metadata
and
snapshots


 (That's not snark, I've been trying to figure out why _I_ would want that
 expanse of auxiliary data so inconveniently stored and I've come up with
 nothing. Maybe I lack imagination.)

 I see the ADS thing now that you mention it. Kinda neat way to recycle the
 otherwise much-disparaged colon-is-for-devices thing. But would that sort of
 thing match the Linux/POSIX expectation at all? Everything in *ix being a
 file, the chaos of expectation from the equivalent /dev/sda1:ADS (to make it
 a portmantu of sorts) becomes unfriendly.


 I've thought of some interesting things to do whit XATTRS (like a kernel
 patch to let an executable carry around environment overrides like a
 restricted/overridden PATH) or include the intended editor

Re: ENOSPC with mkdir and rename

2014-08-05 Thread ronnie sahlberg

On Tue, Aug 5, 2014 at 5:20 AM, Russell Coker russ...@coker.com.au wrote:


 Based on what I've read on this list it seems that BTRFS is less stable in
 3.15 than in 3.14.  Even 3.14 isn't something I'd recommend to random people
 who want something to just work.

 The Debian installer has BTRFS in a list of filesystems to choose with no
 special notice about it.  I'm thinking of filing a Debian bug requesting that
 they put a warning against it.

 What do people here think?

+1 for a warning.

btrfs is still a young filesystem and not as stable as say ext4.
I think it would be very prudent to have a small warning.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 1 week to rebuid 4x 3TB raid10 is a long time!

2014-07-21 Thread ronnie sahlberg

On Sun, Jul 20, 2014 at 7:48 PM, Duncan 1i5t5.dun...@cox.net wrote:
 ashford posted on Sun, 20 Jul 2014 12:59:21 -0700 as excerpted:

 If you assume a 12ms average seek time (normal for 7200RPM SATA drives),
 an 8.3ms rotational latency (half a rotation), an average 64kb write and
 a 100MB/S streaming write speed, each write comes in at ~21ms, which
 gives us ~47 IOPS.  With the 64KB write size, this comes out to ~3MB/S,
 DISK LIMITED.

 The 5MB/S that TM is seeing is fine, considering the small files he says
 he has.

 Thanks for the additional numbers supporting my point. =:^)

 I had run some of the numbers but not to the extent you just did, so I
 didn't know where 5 MiB/s fit in, only that it wasn't entirely out of the
 range of expectation for spinning rust, given the current state of
 optimization... or more accurately the lack thereof, due to the focus
 still being on features.


That is actually nonsense.
Raid rebuild operates on the block/stripe layer and not on the filesystem layer.
It does not matter at all what the average file size is.

Raid rebuild is really only limited by disk i/o speed when performing
a linear read of the whole spindle using huge i/o sizes,
or, if you have multiple spindles on the same bus, the bus saturation speed.

Thus is is perfectly reasonabe to expect ~50MByte/second, per spindle,
when doing a raid rebuild.
That is for the naive rebuild that rebuilds every single stripe. A
smarter rebuild that knows which stripes are unused can skip the
unused stripes and thus become even faster than that.


Now, that the rebuild is off by an order of magnitude is by design but
should be fixed at some stage, but with the current state of btrfs it
is probably better to focus on other more urgent areas first.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Testing with flaky disk

2014-07-21 Thread ronnie sahlberg

List, btrfs developers.

I started working on a test tool for SCSI initiators and filesystem folks.
It is a iSCSI target that implements a bad flaky disks where you
can set precise controls of how/what is broken which you can use to test
error and recovery paths in the initiator/filesystem.

The tool is available at :
https://github.com/rsahlberg/flaky-stgt.git
and is a modified version of the TGTD iscsi target.


Right now it is just an initial prototype and it needs more work to
add more types of errors as well as making it more userfriendly.
But it is still useful enough to illustrate certain failure cases
which could be helpful to btrfs and others.


Let me illustrate. Lets start by creating a BTRFS filesystem spanning
three 1G disks:

#
# Create three disks and export them through flaky iSCSI
#
truncate -s 1G /data/tmp/disk1.img
truncate -s 1G /data/tmp/disk2.img
truncate -s 1G /data/tmp/disk3.img

killall -9 tgtd
./usr/tgtd -f -d 1 

sleep 3

./usr/tgtadm --op new --mode target --tid 1 -T iqn.ronnie.test

./usr/tgtadm --op new --mode logicalunit --tid 1 --lun 1 -b
/data/tmp/disk1.img --blocksize=4096
./usr/tgtadm --op new --mode logicalunit --tid 1 --lun 2 -b
/data/tmp/disk2.img --blocksize=4096
./usr/tgtadm --op new --mode logicalunit --tid 1 --lun 3 -b
/data/tmp/disk3.img --blocksize=4096

./usr/tgtadm --op bind --mode target --tid 1 -I ALL


#
# connect to the three disks
#
iscsiadm --mode discoverydb --type sendtargets --portal 127.0.0.1 --discover
iscsiadm --mode node --targetname iqn.ronnie.test --portal
127.0.0.1:3260 --login
#
# check dmesg, you should now have three new 1G disks
#
# Use: iscsiadm --mode node --targetname iqn.ronnie.test \
#  --portal 127.0.0.1:3260 --logout
# to disconnect the disks when you are finished.


# create a btrfs filesystem
mkfs.btrfs -f -d raid1
/dev/disk/by-path/ip-127.0.0.1:3260-iscsi-iqn.ronnie.test-lun-1
/dev/disk/by-path/ip-127.0.0.1:3260-iscsi-iqn.ronnie.test-lun-2
/dev/disk/by-path/ip-127.0.0.1:3260-iscsi-iqn.ronnie.test-lun-3

# mount the filesystem
mount /dev/disk/by-path/ip-127.0.0.1:3260-iscsi-iqn.ronnie.test-lun-1 /mnt


Then we can proceed to copy a bunch of data to the filesystem so that
there will be some blocks used.


Now we can see how/what happens in the case of a single bad disk.
Lets say the disk is gone bad,   it is still possible to read from the
disk but all writes fail with medium error.
Perhaps this is similar to the case of a cheap disk that has
completely run out of blocks to reallocate to?


===
# make all writes to the third disk fail with write error.
# 3 - MEDIUM ERROR
# 0x0c02 - WRITE ERROR AUTOREALLOCATION FAILED
#
./usr/tgtadm --mode error --op new --tid 1 --lun 3 --error
op=WRITE10,lba=0,len=,pct=100,pause=0,repeat=0,action=CHECK_CONDITION,key=3,asc=0x0c02

# To show all current error injects:
# ./usr/tgtadm --mode error --op show
#
# To delete/clear all current error injects:
# ./usr/tgtadm --mode error --op delete
===



If you now know that this disk has gone bad,  you could try to delete
the device :

btrfs device delete
/dev/disk/by-path/ip-127.0.0.1:3260-iscsi-iqn.ronnie.test-lun-3 /mnt

but this will probably not work, since at least to semi-recent
versions of btrfs you can not remove a device from the filesystem
UNLESS you can also write to the device.

Thus making it impossible to remove the bad device in other ways that
physically removing the device.
This is suboptimal from a data integrity point of view since if the
disk is readable, it
can potentially still contain valid copies of the data which might be
silently errored
on the other mirror.

At some stage, from a data integrity and data robustness standpoint,
it would be nice to be able to device delete a device that is
readable, and contain a valid copy of the data, but still unwriteable.


There is a bunch of other things you can test and emulate with this too.
I have only tested this with semi-recent versions of btrfs and not the
latest version.
I will wait until the current versions of btrfs becomes more
stable/robust before I
will start experimenting with it.


Since I think this could be invaluably useful for a filesystem
developer, please have a look. I am more than happy to add additional
features that would make it even more useful for
filesystem-error-path-and-recovery-testing



regards
ronnie sahlberg
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on software RAID0

2014-05-05 Thread ronnie sahlberg

start-btrfs-dmcrypt :
...
echo $pwd |
...

Hmmm. This makes the plaintext password visible in ps output.
It is probably better to pass this in by redirecting a file to stdin.



On Mon, May 5, 2014 at 2:25 PM, Marc MERLIN m...@merlins.org wrote:
 On Mon, May 05, 2014 at 10:51:46PM +0200, john terragon wrote:
 Hi.
 I'm about to try btrfs on an RAID0 md device (to be precise there will
 be dm-crypt in between the md device and btrfs). If I used ext4 I
 would set the stride and stripe_width extended options. Is there
 anything similar I should be doing with mkfs.btrfs? Or maybe some
 mount options beneficial to this kind of setting.

 This is not directly an answer to your question, so far I haven't used a
 special option like this with btrfs on my arrays although my
 undertstanding is that it's not as important as with ext4.

 That said, please read
 http://marc.merlins.org/perso/btrfs/post_2014-04-27_Btrfs-Multi-Device-Dmcrypt.html

 1) use align-payload=1024 on cryptsetup instead of something bigger like
 8192. This will reduce write amplification (if you're not on an SSD).

 2) you don't need md0 in the middle, crypt each device and then use
 btrfs built in raid0 which will be faster (and is stable, at least as
 far as we know :) ).

 Then use /etc/crypttab or a script like this
 http://marc.merlins.org/linux/scripts/start-btrfs-dmcrypt
 to decrypt all your devices in one swoop and mount btrfs.

 Marc
 --
 A mouse is a device used to point at the xterm you want to type in - A.S.R.
 Microsoft is to operating systems 
    what McDonalds is to gourmet 
 cooking
 Home page: http://marc.merlins.org/ | PGP 
 1024R/763BE901
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Which companies are using Btrfs in production?

2014-04-25 Thread ronnie sahlberg

On Fri, Apr 25, 2014 at 8:20 AM, Marc MERLIN m...@merlins.org wrote:
 On Fri, Apr 25, 2014 at 04:47:04PM +0200, David Sterba wrote:
 On Thu, Apr 24, 2014 at 04:14:56PM -0700, Marc MERLIN wrote:
Netgear uses BTRFS as the filesystem in their refreshed ReadyNAS line.
They apparently use Oracle's linux distro so I assume they're relying 
on
them to do most of the heavy lifting as far as support BTRFS and
backporting goes since they're still on 3.0! They also have raid5/6
support so they are probably running BTRFS on top of md.
   
  
   Yes, and any contributions you see coming from me so far, come from
   NETGEAR.  I've been using my gmail account because I can't make our
 
  Thanks.
  https://btrfs.wiki.kernel.org/index.php/Contributors
  Updated :)

 There are lots of contributors with the same small amout of patches
 contributed and are not listed there.  This is first time I hear about
 Netgear being a contributor and it looks strange to see that name among
 the major contributors.

 If there's demand to list all the minor contributors, then let's add a
 separate section, otherwise I'm going to remove the entry.

 Mmmh. So I'm not Jon Corbet who has all those fancy honed scripts + non
 trivial time he spends doing this by hand.

 That said, my goal was not to say which company gave the most
 contributions and try and rank them.
 Honestly, right now any company that is using btrfs and contributing to
 it is a great thing in my book.
 I'm not even a fan of counting number of lines or frequency of patches.
 How do you compare someone sending easy cleanup patches vs someone who
 spent a month tracking down a file corruption problem no one could find
 nor fix, and sends a 3 line patch to fix it in the end?


+1

For such a small community I think it would be a mistake to have
arbitrary quality or quantity
thresholds for who deserves to be on the list.

I think you should list everyone. Even if they only sent a single
patch to fix a typo in a comment.
If they sent a patch, then it means they care about the code base and
then they should be on the list.


 But eh, I'm just one guy and and it's just my opinion :)

Me too.




 How about we leave that decision with Chris Mason?

 Marc
 --
 A mouse is a device used to point at the xterm you want to type in - A.S.R.
 Microsoft is to operating systems 
    what McDonalds is to gourmet 
 cooking
 Home page: http://marc.merlins.org/ | PGP 
 1024R/763BE901
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How to handle a RAID5 arrawy with a failing drive?

2014-03-16 Thread ronnie sahlberg

On Sun, Mar 16, 2014 at 4:17 PM, Marc MERLIN m...@merlins.org wrote:
 On Sun, Mar 16, 2014 at 05:12:10PM -0600, Chris Murphy wrote:

 On Mar 16, 2014, at 4:55 PM, Chris Murphy li...@colorremedies.com wrote:

  Then use btrfs replace start.

 Looks like in 3.14rc6 replace isn't yet supported. I get dev_replace cannot 
 yet handle RAID5/RAID6.

 When I do:
 btrfs device add new mp

 The command hangs, no kernel messages.

 Ok, that's kind of what I thought.
 So, for now, with raid5:
 - btrfs seems to handle a drive not working
 - you say I can mount with the drive missing in degraded mode (I haven't
   tried that, I will)
 - but no matter how I remove the faulty drive, there is no rebuild on a
   new drive procedure that works yet

 Correct?

There was a discussion a while back that suggested that a balance
would read all blocks and write them out again and that would recover
the data.

I have no idea if that works or not.
Only do this as a last resort once you have already considered all
data lost forever.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: What to do about df and btrfs fi df

2014-02-10 Thread ronnie sahlberg

On Mon, Feb 10, 2014 at 7:13 PM, cwillu cwi...@cwillu.com wrote:
 On Mon, Feb 10, 2014 at 7:02 PM, Roger Binns rog...@rogerbinns.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 10/02/14 10:24, cwillu wrote:
 The regular df data used number should be the amount of space required
 to hold a backup of that content (assuming that the backup maintains
 reflinks and compression and so forth).

 There's no good answer for available space;

 I think the flipside of the above works well.  How large a group of files
 can you expect to create before you will get ENOSPC?

 That for example is the check code does that looks at df - I need to put
 in XGB of files - will it fit?  It is also what users do.

 But the answer changes dramatically depending on whether it's large
 numbers of small files or a small number of large files, and the
 conservative worst-case choice means we report a number that is half
 what is probably expected.

I don't think that is a problem, as long as the avail guesstimate is
conservative.

Scenario:
A user has 10G of files and df reports that there are 11G available.
I think the expectation is that copying these 10G into the filesystem
will not ENOSPC.
After the copy completes, whether the new avail number is ==1G or 1G
is less important IMHO.

I.e. I like to see df output as a you can write AT LEAST this much
more data until the filesystem is full.


That was my 5 cent.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Scrubbing with BTRFS Raid 5

2014-01-22 Thread ronnie sahlberg

On Wed, Jan 22, 2014 at 12:45 PM, Chris Mason c...@fb.com wrote:
 On Tue, 2014-01-21 at 17:08 +, Duncan wrote:
 Graham Fleming posted on Tue, 21 Jan 2014 01:06:37 -0800 as excerpted:

  Thanks for all the info guys.
 
  I ran some tests on the latest 3.12.8 kernel. I set up 3 1GB files and
  attached them to /dev/loop{1..3} and created a BTRFS RAID 5 volume with
  them.
 
  I copied some data (from dev/urandom) into two test files and got their
  MD5 sums and saved them to a text file.
 
  I then unmounted the volume, trashed Disk3 and created a new Disk4 file,
  attached to /dev/loop4.
 
  I mounted the BTRFS RAID 5 volume degraded and the md5 sums were fine. I
  added /dev/loop4 to the volume and then deleted the missing device and
  it rebalanced. I had data spread out on all three devices now. MD5 sums
  unchanged on test files.
 
  This, to me, implies BTRFS RAID 5 is working quite well and I can in
  fact,
  replace a dead drive.
 
  Am I missing something?

 What you're missing is that device death and replacement rarely happens
 as neatly as your test (clean unmounts and all, no middle-of-process
 power-loss, etc).  You tested best-case, not real-life or worst-case.

 Try that again, setting up the raid5, setting up a big write to it,
 disconnect one device in the middle of that write (I'm not sure if just
 dropping the loop works or if the kernel gracefully shuts down the loop
 device), then unplugging the system without unmounting... and /then/ see
 what sense btrfs can make of the resulting mess.  In theory, with an
 atomic write btree filesystem such as btrfs, even that should work fine,
 minus perhaps the last few seconds of file-write activity, but the
 filesystem should remain consistent on degraded remount and device add,
 device remove, and rebalance, even if another power-pull happens in the
 middle of /that/.

 But given btrfs' raid5 incompleteness, I don't expect that will work.


 raid5/6 deals with IO errors from one or two drives, and it is able to
 reconstruct the parity from the remaining drives and give you good data.

 If we hit a crc error, the raid5/6 code will try a parity reconstruction
 to make good data, and if we find good data from the other copy, it'll
 return that up to userland.

 In other words, for those cases it works just like raid1/10.  What it
 won't do (yet) is write that good data back to the storage.  It'll stay
 bad until you remove the device or run balance to rewrite everything.

 Balance will reconstruct parity to get good data as it balances.  This
 isn't as useful as scrub, but that work is coming.


That is awesome!

What about online conversion from not-raid5/6 to raid5/6  what is the
status for that code, for example
what happens if there is a failure during the conversion or a reboot ?



 -chris



 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Scrubbing with BTRFS Raid 5

2014-01-22 Thread ronnie sahlberg

On Wed, Jan 22, 2014 at 1:16 PM, Chris Mason c...@fb.com wrote:
 On Wed, 2014-01-22 at 13:06 -0800, ronnie sahlberg wrote:
 On Wed, Jan 22, 2014 at 12:45 PM, Chris Mason c...@fb.com wrote:
  On Tue, 2014-01-21 at 17:08 +, Duncan wrote:
  Graham Fleming posted on Tue, 21 Jan 2014 01:06:37 -0800 as excerpted:
 
   Thanks for all the info guys.
  
   I ran some tests on the latest 3.12.8 kernel. I set up 3 1GB files and
   attached them to /dev/loop{1..3} and created a BTRFS RAID 5 volume with
   them.
  
   I copied some data (from dev/urandom) into two test files and got their
   MD5 sums and saved them to a text file.
  
   I then unmounted the volume, trashed Disk3 and created a new Disk4 file,
   attached to /dev/loop4.
  
   I mounted the BTRFS RAID 5 volume degraded and the md5 sums were fine. I
   added /dev/loop4 to the volume and then deleted the missing device and
   it rebalanced. I had data spread out on all three devices now. MD5 sums
   unchanged on test files.
  
   This, to me, implies BTRFS RAID 5 is working quite well and I can in
   fact,
   replace a dead drive.
  
   Am I missing something?
 
  What you're missing is that device death and replacement rarely happens
  as neatly as your test (clean unmounts and all, no middle-of-process
  power-loss, etc).  You tested best-case, not real-life or worst-case.
 
  Try that again, setting up the raid5, setting up a big write to it,
  disconnect one device in the middle of that write (I'm not sure if just
  dropping the loop works or if the kernel gracefully shuts down the loop
  device), then unplugging the system without unmounting... and /then/ see
  what sense btrfs can make of the resulting mess.  In theory, with an
  atomic write btree filesystem such as btrfs, even that should work fine,
  minus perhaps the last few seconds of file-write activity, but the
  filesystem should remain consistent on degraded remount and device add,
  device remove, and rebalance, even if another power-pull happens in the
  middle of /that/.
 
  But given btrfs' raid5 incompleteness, I don't expect that will work.
 
 
  raid5/6 deals with IO errors from one or two drives, and it is able to
  reconstruct the parity from the remaining drives and give you good data.
 
  If we hit a crc error, the raid5/6 code will try a parity reconstruction
  to make good data, and if we find good data from the other copy, it'll
  return that up to userland.
 
  In other words, for those cases it works just like raid1/10.  What it
  won't do (yet) is write that good data back to the storage.  It'll stay
  bad until you remove the device or run balance to rewrite everything.
 
  Balance will reconstruct parity to get good data as it balances.  This
  isn't as useful as scrub, but that work is coming.
 

 That is awesome!

 What about online conversion from not-raid5/6 to raid5/6  what is the
 status for that code, for example
 what happens if there is a failure during the conversion or a reboot ?

 The conversion code uses balance, so that works normally.  If there is a
 failure during the conversion you'll end up with some things raid5/6 and
 somethings at whatever other level you used.

 The data will still be there, but you are more prone to enospc
 problems ;)


Ok, but if there is enough space,  you could just restart the balance
and it will eventually finish and all should, with some luck, be ok?

Awesome. This sounds like things are a lot closer to raid5/6 being
fully operational than I realized.


 -chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Unmountable Array After Drive Failure During Device Deletion

2013-12-21 Thread ronnie sahlberg

Similar things happened to me. (See my unanswered posts ~1Sep, this fs
is not really ready for production I think)

When you get wrong transid errors and reports that you have checksums
being repaired,
that is all bad news and no one can help you.

Unfortunately there are, I think, no real tools to fix basic fs erros.


I never managed to get the my in a state where it could be mounted at all
but did manage to recover most of my data using
btrfs restore from
https://github.com/FauxFaux/btrfs-progs

This is the argument from that command that I used to recover data :
I got most data back with ith but YMMV.

  commit 2a2a1fb21d375a46f9073e44a7b9d9bb7bfaa1e2
  Author: Peter Stuge pe...@stuge.se
  Date:   Fri Nov 25 01:03:58 2011 +0100

restore: Add regex matching of paths and files to be restored

The option -m is used to specify the regex string. -c is used to
specify case insensitive matching. -i was already taken.

In order to restore only a single folder somewhere in the btrfs
tree, it is unfortunately neccessary to construct a slightly
nontrivial regex, e.g.:

restore -m '^/(|home(|/username(|/Desktop(|/.*$' /dev/sdb2 /output

This is needed in order to match each directory along the way to the
Desktop directory, as well as all contents below the Desktop directory.

Signed-off-by: Peter Stuge pe...@stuge.se
Signed-off-by: Josef Bacik jo...@redhat.com


I wont give advice for your data.
For my data, I copied as much data as I could recover from the
filesystem over to a different filesystem
using the tools in the repo above.
After that destroy the damaged filesystem and rebuild from scratch.


Then  depending on how important your data is, you start making
backups regularely, or switch to a less fragile and unrepairable fs.


On Thu, Dec 19, 2013 at 1:26 AM, Chris Kastorff encryp...@gmail.com wrote:
 I'm using btrfs in data and metadata RAID10 on drives (not on md or any
 other fanciness.)

 I was removing a drive (btrfs dev del) and during that operation, a
 different drive in the array failed. Having not had this happen before,
 I shut down the machine immediately due to the extremely loud piezo
 buzzer on the drive controller card. I attempted to do so cleanly, but
 the buzzer cut through my patience and after 4 minutes I cut the power.

 Afterwards, I located and removed the failed drive from the system, and
 then got back to linux. The array no longer mounts (failed to read the
 system array on sdc), with nearly identical messages when attempted
 with -o recovery and -o recovery,ro.

 btrfsck asserts and coredumps, as usual.

 The drive that was being removed is devid 9 in the array, and is
 /dev/sdm1 in the btrfs fi show seen below.

 Kernel 3.12.4-1-ARCH, btrfs-progs v0.20-rc1-358-g194aa4a-dirty
 (archlinux build.)

 Can I recover the array?

 == dmesg during failure ==

 ...
 sd 0:2:3:0: [sdd] Unhandled error code
 sd 0:2:3:0: [sdd]
 Result: hostbyte=0x04 driverbyte=0x00
 sd 0:2:3:0: [sdd] CDB:
 cdb[0]=0x2a: 2a 00 26 89 5b 00 00 00 80 00
 end_request: I/O error, dev sdd, sector 646535936
 btrfs_dev_stat_print_on_error: 7791 callbacks suppressed
 btrfs: bdev /dev/sdd errs: wr 315858, rd 230194, flush 0, corrupt 0, gen 0
 sd 0:2:3:0: [sdd] Unhandled error code
 sd 0:2:3:0: [sdd]
 Result: hostbyte=0x04 driverbyte=0x00
 sd 0:2:3:0: [sdd] CDB:
 cdb[0]=0x2a: 2a 00 26 89 5b 80 00 00 80 00
 end_request: I/O error, dev sdd, sector 646536064
 ...

 == dmesg after new boot, mounting attempt ==

 btrfs: device label lake devid 11 transid 4893967 /dev/sda
 btrfs: disk space caching is enabled
 btrfs: failed to read the system array on sdc
 btrfs: open_ctree failed

 == dmesg after new boot, mounting attempt with -o recovery,ro ==

 btrfs: device label lake devid 11 transid 4893967 /dev/sda
 btrfs: enabling auto recovery
 btrfs: disk space caching is enabled
 btrfs: failed to read the system array on sdc
 btrfs: open_ctree failed

 == btrfsck ==

 deep# btrfsck /dev/sda
 warning, device 14 is missing
 warning devid 14 not found already
 parent transid verify failed on 87601116364800 wanted 4893969 found 4893913
 parent transid verify failed on 87601116364800 wanted 4893969 found 4893913
 parent transid verify failed on 87601116381184 wanted 4893969 found 4893913
 parent transid verify failed on 87601116381184 wanted 4893969 found 4893913
 parent transid verify failed on 87601115320320 wanted 4893969 found 4893913
 parent transid verify failed on 87601115320320 wanted 4893969 found 4893913
 parent transid verify failed on 87601117097984 wanted 4893969 found 4892460
 parent transid verify failed on 87601117097984 wanted 4893969 found 4892460
 Ignoring transid failure
 Checking filesystem on /dev/sda
 UUID: d5e17c49-d980-4bde-bd96-3c8bc95ea077
 checking extents
 parent transid verify failed on 87601117159424 wanted 4893969 found 4893913
 parent transid verify failed on 87601117159424 wanted 4893969 found 4893913
 parent transid verify failed on 87601116368896 wanted

Re: Triple parity and beyond

2013-11-26 Thread ronnie sahlberg

This is great stuff.
Now, how can we get this into btrfs and md?


On Wed, Nov 20, 2013 at 1:23 PM, Andrea Mazzoleni amadva...@gmail.com wrote:
 Hi,

 First, create a 3 by 6 cauchy matrix, using x_i = 2^-i, and y_i = 0 for i=0, 
 and y_i = 2^i for other i.
 In this case:   x = { 1, 142, 71, 173, 216, 108 }  y = { 0, 2, 4).  The 
 cauchy matrix is:

   1   2   4   8  16  32
 244  83  78 183 118  47
 167  39 213  59 153  82

 Divide row 2 by 244 and row 3 by 167.  Then extend it with a row of ones on 
 top and it's still MDS,
 and  that's the code for m=4, with RAID-6 as a subset.  Very nice!

 You got it Jim!

 Thanks,
 Andrea
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs.h and btrfs-progs licensing

2013-09-10 Thread ronnie sahlberg

For crc32c you can use this one which is lgpl:
https://github.com/sahlberg/libiscsi/blob/master/lib/crc32c.c


You can use the generator at :
http://www.ross.net/crc/download/crc_v3.txt
to generate this (and others)



On Mon, Sep 9, 2013 at 3:50 PM, David Sterba dste...@suse.cz wrote:
 On Mon, Sep 09, 2013 at 02:13:50PM -0700, Andy Grover wrote:
 However, as it stands, the kernel's include/uapi/linux/btrfs.h and
 btrfs-progs are GPLv2, which means that a libbtrfs that is based on either
 of these might also be construed to need to be GPLv2, and any program
 *using* libbtrfs might also be construed to need to be GPLv2.

 Looking at nm libbtrfs.a there are rbtree and crc32c symbols included
 in the library, licensed under GPLv2, the rest is pure userspace code and
 the people are around to ask about relicensing.

 david
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Unmountable filesystem parent transid verify failed

2013-09-01 Thread ronnie sahlberg

Hi again.
Sorry for top posting.


I have a 9 disk filesystem that does not mount anymore and need some
help/advice so I can recover the data.

What happened was that I was running a btrfs delete device
under Ubuntu 13.04   Kernel 3.8
and after a long time of moving data around it crashed with a SEGV.

Now the filesystem does not mount and none of the recovery options I
have tried work.

I have upgraded to Debian testing and are now using kerne3.10-2-amd64



When I try btrfsck I get heaps of these :
Ignoring transid failure
parent transid verify failed on 24419581267968 wanted 301480 found 301495
parent transid verify failed on 24419581267968 wanted 301480 found 301495
parent transid verify failed on 24419581267968 wanted 301480 found 301495
parent transid verify failed on 24419581267968 wanted 301480 found 301495


I have tried using btrfs-image :
but it too crashes eventually with :

btrfs-image -c9 -t4 /dev/sde btrfs-image
...
btrfs-image: ctree.c:787: read_node_slot: Assertion `!(level == 0)' failed.
Aborted


mount -o ro,recovery fails
# mount -o ro,recovery /dev/sde /DATA
mount: wrong fs type, bad option, bad superblock on /dev/sde,
...


# btrfs-zero-log /dev/sde
eventually fails with :
btrfs-zero-log: ctree.c:342: __btrfs_cow_block: Assertion
`!(btrfs_header_generation(buf)  trans-transid)' failed.
Aborted


What should I try next?


regards
ronnie sahlberg
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Unmountable BTRFS with parent transid verify failed

2013-08-31 Thread ronnie sahlberg

Hi,

I have a 9 disk raid1 filesystem that is no longer mountable.
I am using ubuntu 13.04 with kernel 3.8.0-26-generic


What happened was that I was removing a device using
btrfs device delete
and this was running for quite a while (I was removing a 3T device)
but eventually this failed with the btrfs command segfaulting.

Now when I have rebooted but the filesystem does not mount.
When I run btrfsck /dev/sde I get a lot of

parent transid verify failed on 3539986560 wanted 301481 found 301495
parent transid verify failed on 3539986560 wanted 301481 found 301495
parent transid verify failed on 3539986560 wanted 301481 found 301495
parent transid verify failed on 3539986560 wanted 301481 found 301495
Ignoring transid failure
leaf parent key incorrect 3539986560
leaf parent key incorrect 3536398464
bad block 3536398464

And while btrfsck eventually does complete  the filesystem remains unmountable.

Any advice ?


regards
ronnie sahlberg
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RADI6 questions

2013-06-01 Thread ronnie sahlberg

Hi List,

I have a filesystem that is spanning about 10 devices.
It is currently using RAID1 for both data and metadata.

In order to get higher availability and be able to handle multi device failures
I would like to change from RAID1 to RAID6.


Is it possible/stable/supported/recommended to change data from RAID1 to RAID6 ?
(I assume btrfs fi balance ...  is used for this?)

Metadata is currently RAID1, is it supported to put metadata as RAID6 too?
It would be odd to have lesser protection for metadata than data.
Optimally I would like a mode where metadata is mirrored onto all the
spindles in the filesystem, not just 2 in RAID1 or n in RAID6.


Im running a 3.8.0 kernel.


regards
ronnie sahlberg
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: What if TRIM issued a wipe on devices that don't TRIM?

Re: Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS and MDADM (Dec 2014) – Ronny Egners Blog

Re: Encountered kernel bug#72811. Advice on recovery?

Re: Is it possible to speed up unlink()?

Re: RAID system with adaption to changed number of disks

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

Re: Trying to rescue my data :(

Re: btrfs fail behavior when a device vanishes

Re: btrfs fail behavior when a device vanishes

Re: btrfs fail behavior when a device vanishes

Re: btrfs fail behavior when a device vanishes

Re: Timeouts copying large files to a Samba server with Btrfs

Re: Btrfs/RAID5 became unmountable after SATA cable fault

Re: Btrfs/RAID5 became unmountable after SATA cable fault

Re: Can BTRFS handle XATTRs larger than 4K?

Re: ENOSPC with mkdir and rename

Re: 1 week to rebuid 4x 3TB raid10 is a long time!

Testing with flaky disk

Re: btrfs on software RAID0

Re: Which companies are using Btrfs in production?

Re: How to handle a RAID5 arrawy with a failing drive?

Re: What to do about df and btrfs fi df

Re: Scrubbing with BTRFS Raid 5

Re: Scrubbing with BTRFS Raid 5

Re: Unmountable Array After Drive Failure During Device Deletion

Re: Triple parity and beyond

Re: btrfs.h and btrfs-progs licensing

Unmountable filesystem parent transid verify failed

Unmountable BTRFS with parent transid verify failed

RADI6 questions

30 matches

Site Navigation

Mail list logo

Footer information