btrfs raid56 Was: "csum failed" that was not detected by scrub

2014-05-02 Thread Duncan
Jaap Pieroen posted on Fri, 02 May 2014 17:48:13 + as excerpted:

> Duncan <1i5t5.duncan  cox.net> writes:
> 
> 
>> To those that know the details, this tells the story.
>> 
>> Btrfs raid5/6 modes are not yet code-complete, and scrub is one of the
>> incomplete bits.  btrfs scrub doesn't know how to deal with raid5/6
>> properly just yet.

>> The raid5/6 page (which I didn't otherwise see conveniently linked, I
>> dug it out of the recent changes list since I knew it was there from
>> on-list discussion):
>> 
>> https://btrfs.wiki.kernel.org/index.php/RAID56

> So raid5 is much more useless than I assumed. I read Marc's blog and
> figured that btrfs was ready enough.
> 
> I' really in trouble now. I tried to get rid of raid5 by doing a convert
> balance to raid1. But of course this triggered the same issue. And now I
> have a dead system because the first thing btrfs does after mounting is
> continue the balance which will crash the system and send me into a
> vicious loop.
> 
> - How can I stop btrfs from continuing balancing?

That one's easy.  See the Documentation/filesystems/btrfs.txt file in the 
kernel tree or the wiki for btrfs mount options, one of which is 
"skip_balance", to address this very sort of problem! =:^)

Alternatively, mounting it read-only should prevent further changes 
including the balance, at least allowing you to get the data off the 
filesystem.

> - How can I salvage this situation and convert to raid1?
> 
> Unfortunately I have little spare drives left. Not enough to contain
> 4.7TiB of data.. :(

[OK, this goes a bit philosophical, but it's something to think about...]

If you've done your research and followed the advice of the warnings when 
you do a mkfs.btrfs or on the wiki, not a problem, since you know that 
btrfs is still under heavy development and that as a result, it's even 
more critical to have current tested backups for anything you value 
anyway.  Simply use those backups.

Which, by definition, means that if you don't have such backups, you 
didn't consider the data all that valuable after all, actions perhaps 
giving the lie to your claims.  And no excuse for not doing the research 
either, since if you really care about your data, you research a 
filesystem you're not familiar with before trusting your data to it.  So 
again, if you didn't know btrfs was experimental and thus didn't have 
those backups, by definition your actions say you didn't really care 
about the data you put on it, no matter what your words might say.

OTOH, there *IS* such a thing as not realizing the value of something 
until you're in the process of losing it... that I do understand.  But of 
course try telling that to, for instance, someone who has just lost a 
loved one that they never actually /told/ them that...  Sometimes it's 
simply too late.  Tho if it's going to happen, at least here I'd much 
rather it happen to some data, than one of my own loved ones...


Anyway, at least for now you should still be able to recover most of the 
data using skip_balance or read-only mounting.  My guess is that if push 
comes to shove you can either prioritize that data and give up a TiB or 
two if it comes to that, or scrimp here and there, putting a few gigs on 
the odd blank DVD you may have lying around or downgrading a few meals to 
Raman-noodle to afford the $100 or so shipped that pricewatch says a new 
3 TB drive costs, these days.  I've been there, and have found that if I 
think I need it bad enough, that $100 has a way of appearing, like I said 
even if I'm noodling it for a few meals to do it.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-05-02 Thread Chris Murphy

On May 2, 2014, at 3:08 PM, Hugo Mills  wrote:

> On Fri, May 02, 2014 at 01:21:50PM -0600, Chris Murphy wrote:
>> 
>> On May 2, 2014, at 2:23 AM, Duncan <1i5t5.dun...@cox.net> wrote:
>>> 
>>> Something tells me btrfs replace (not device replace, simply replace) 
>>> should be moved to btrfs device replace…
>> 
>> The syntax for "btrfs device" is different though; replace is like balance: 
>> btrfs balance start and btrfs replace start. And you can also get a status 
>> on it. We don't (yet) have options to stop, start, resume, which could maybe 
>> come in handy for long rebuilds and a reboot is required (?) although maybe 
>> that just gets handled automatically: set it to pause, then unmount, then 
>> reboot, then mount and resume.
>> 
>>> Well, I'd say two copies if it's only two devices in the raid1... would 
>>> be true raid1.  But if it's say four devices in the raid1, as is 
>>> certainly possible with btrfs raid1, that if it's not mirrored 4-way 
>>> across all devices, it's not true raid1, but rather some sort of hybrid 
>>> raid,  raid10 (or raid01) if the devices are so arranged, raid1+linear if 
>>> arranged that way, or some form that doesn't nicely fall into a well 
>>> defined raid level categorization.
>> 
>> Well, md raid1 is always n-way. So if you use -n 3 and specify three 
>> devices, you'll get 3-way mirroring (3 mirrors). But I don't know any 
>> hardware raid that works this way. They all seem to be raid 1 is strictly 
>> two devices. At 4 devices it's raid10, and only in pairs.
>> 
>> Btrfs raid1 with 3+ devices is unique as far as I can tell. It is something 
>> like raid1 (2 copies) + linear/concat. But that allocation is round robin. I 
>> don't read code but based on how a 3 disk raid1 volume grows VDI files as 
>> it's filled it looks like 1GB chunks are copied like this
>> 
>> Disk1Disk2   Disk3
>> 134  124 235
>> 679  578 689
>> 
>> So 1 through 9 each represent a 1GB chunk. Disk 1 and 2 each have a chunk 1; 
>> disk 2 and 3 each have a chunk 2, and so on. Total of 9GB of data taking up 
>> 18GB of space, 6GB on each drive. You can't do this with any other raid1 as 
>> far as I know. You do definitely run out of space on one disk first though 
>> because of uneven metadata to data chunk allocation.
> 
>   The algorithm is that when the chunk allocator is asked for a block
> group (in pairs of chunks for RAID-1), it picks the number of chunks
> it needs, from different devices, in order of the device with the most
> free space. So, with disks of size 8, 4, 4, you get:
> 
> Disk 1: 12345678
> Disk 2: 1357
> Disk 3: 2468
> 
> and with 8, 8, 4, you get:
> 
> Disk 1: 1234568A
> Disk 2: 1234579A
> Disk 3: 6789

Sure in my example I was assuming equal size disks. But it's a good example to 
have uneven disks also, because it exemplifies all the more the flexibility 
btrfs replication has, over alternatives, with odd numbered *and* uneven size 
disks.


Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-05-02 Thread Hugo Mills
On Fri, May 02, 2014 at 01:21:50PM -0600, Chris Murphy wrote:
> 
> On May 2, 2014, at 2:23 AM, Duncan <1i5t5.dun...@cox.net> wrote:
> > 
> > Something tells me btrfs replace (not device replace, simply replace) 
> > should be moved to btrfs device replace…
> 
> The syntax for "btrfs device" is different though; replace is like balance: 
> btrfs balance start and btrfs replace start. And you can also get a status on 
> it. We don't (yet) have options to stop, start, resume, which could maybe 
> come in handy for long rebuilds and a reboot is required (?) although maybe 
> that just gets handled automatically: set it to pause, then unmount, then 
> reboot, then mount and resume.
> 
> > Well, I'd say two copies if it's only two devices in the raid1... would 
> > be true raid1.  But if it's say four devices in the raid1, as is 
> > certainly possible with btrfs raid1, that if it's not mirrored 4-way 
> > across all devices, it's not true raid1, but rather some sort of hybrid 
> > raid,  raid10 (or raid01) if the devices are so arranged, raid1+linear if 
> > arranged that way, or some form that doesn't nicely fall into a well 
> > defined raid level categorization.
> 
> Well, md raid1 is always n-way. So if you use -n 3 and specify three devices, 
> you'll get 3-way mirroring (3 mirrors). But I don't know any hardware raid 
> that works this way. They all seem to be raid 1 is strictly two devices. At 4 
> devices it's raid10, and only in pairs.
> 
> Btrfs raid1 with 3+ devices is unique as far as I can tell. It is something 
> like raid1 (2 copies) + linear/concat. But that allocation is round robin. I 
> don't read code but based on how a 3 disk raid1 volume grows VDI files as 
> it's filled it looks like 1GB chunks are copied like this
> 
> Disk1 Disk2   Disk3
> 134   124 235
> 679   578 689
> 
> So 1 through 9 each represent a 1GB chunk. Disk 1 and 2 each have a chunk 1; 
> disk 2 and 3 each have a chunk 2, and so on. Total of 9GB of data taking up 
> 18GB of space, 6GB on each drive. You can't do this with any other raid1 as 
> far as I know. You do definitely run out of space on one disk first though 
> because of uneven metadata to data chunk allocation.

   The algorithm is that when the chunk allocator is asked for a block
group (in pairs of chunks for RAID-1), it picks the number of chunks
it needs, from different devices, in order of the device with the most
free space. So, with disks of size 8, 4, 4, you get:

Disk 1: 12345678
Disk 2: 1357
Disk 3: 2468

and with 8, 8, 4, you get:

Disk 1: 1234568A
Disk 2: 1234579A
Disk 3: 6789

   Hugo.

> Anyway I think we're off the rails with raid1 nomenclature as soon as we have 
> 3 devices. It's probably better to call it replication, with an assumed 
> default of 2 replicates unless otherwise specified.
> 
> There's definitely a benefit to a 3 device volume with 2 replicates, 
> efficiency wise. As soon as we go to four disks 2 replicates it makes more 
> sense to do raid10, although I haven't tested odd device raid10 setups so I'm 
> not sure what happens.
> 
> 
> Chris Murphy
> 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Prisoner unknown:  Return to Zenda. ---   


signature.asc
Description: Digital signature


Re: Unable to boot

2014-05-02 Thread Chris Murphy

On May 2, 2014, at 4:00 AM, George Pochiscan  wrote:

> Hello,
> 
> I have a problem with a server with Fedora 20 and BTRFS. This server had 
> frequent hard restarts before the filesystem got corrupt and we are unable to 
> boot it.
> 
> We have a HP Proliant server with 4 disks @1TB each and Software RAID 5.
> It had Debian installed (i don't know the version) and right now i'm using 
> fedora 20 live to try to rescue the  system.

Fedora 20 Live has kernel 3.11.10 and btrfs-progs 
0.20.rc1.20131114git9f0c53f-1.fc20. So the general rule of thumb without 
knowing exactly what the problem and solution is, is to try a much newer kernel 
and btrfs-progs, like a Fedora Rawhide live media. These are built daily, but 
don't always succeed so you can go here to find the latest of everything:

https://apps.fedoraproject.org/releng-dash/

Find Fedora Live Desktop or Live KDE and click on details. Click the green link 
under descendants   livecd. And then under Output listing you'll see an ISO you 
can download, the one there right now is 
Fedora-Live-Desktop-x86_64-rawhide-20140502.iso - but of course this changes 
daily.

You might want to boot with kernel parameter slub_debug=- (that's a minus 
symbol) because all but Monday built Rawhide kernels have a bunch of kernel 
debug options enabled which makes it quite slow.


> 
> When we try btrfsck /dev/md127 i have a lot of checksum errors, and the 
> output is: 
> 
> Checking filesystem on /dev/md127
> UUID: e068faf0-2c16-4566-9093-e6d1e21a5e3c
> checking extents
> checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
> checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
> checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
> checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
> Csum didn't match
> checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
> checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
> checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
> checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
> Csum didn't match
> -
> 
> extent buffer leak: start 1006686208 len 4096
> found 32039247396 bytes used err is -22
> total csum bytes: 41608612
> total tree bytes: 388857856
> total fs tree bytes: 310124544
> total extent tree bytes: 22016000
> btree space waste bytes: 126431234
> file data blocks allocated: 47227326464
> referenced 42595635200
> Btrfs v3.12


I suggest a recent Rawhide build. And I suggest just trying to mount the file 
system normally first, and post anything that appears in dmesg. And if the 
mount fails, then try mount option -o recovery, and also post any dmesg 
messages from that too, and note whether or not it mounts. Finally if that 
doesn't work either then see if -o ro,recovery works and what kernel messages 
you get.


> 
> 
> 
> When i attempt to repair i have the following error:
> -
> Backref 1005817856 parent 5 root 5 not found in extent tree
> backpointer mismatch on [1005817856 4096]
> owner ref check failed [1006686208 4096]
> repaired damaged extent references
> Failed to find [1000525824, 168, 4096]
> btrfs unable to find ref byte nr 1000525824 parent 0 root 1  owner 1 offset 0
> btrfsck: extent-tree.c:1752: write_one_cache_group: Assertion `!(ret)' failed.
> Aborted
> 

You really shouldn't use --repair right off the bat, it's not a recommended 
early step, you should try normal mounting with newer kernels first, then 
recovery mount options first. Sometimes the repair option makes things worse. 
I'm not sure what its safety status is as of v3.14.

https://btrfs.wiki.kernel.org/index.php/Problem_FAQ

Fedora includes btrfs-zero-log already so depending on the kernel messages you 
might try that before a btrfsck --repair.



Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-05-02 Thread Chris Murphy

On May 2, 2014, at 2:23 AM, Duncan <1i5t5.dun...@cox.net> wrote:
> 
> Something tells me btrfs replace (not device replace, simply replace) 
> should be moved to btrfs device replace…

The syntax for "btrfs device" is different though; replace is like balance: 
btrfs balance start and btrfs replace start. And you can also get a status on 
it. We don't (yet) have options to stop, start, resume, which could maybe come 
in handy for long rebuilds and a reboot is required (?) although maybe that 
just gets handled automatically: set it to pause, then unmount, then reboot, 
then mount and resume.

> Well, I'd say two copies if it's only two devices in the raid1... would 
> be true raid1.  But if it's say four devices in the raid1, as is 
> certainly possible with btrfs raid1, that if it's not mirrored 4-way 
> across all devices, it's not true raid1, but rather some sort of hybrid 
> raid,  raid10 (or raid01) if the devices are so arranged, raid1+linear if 
> arranged that way, or some form that doesn't nicely fall into a well 
> defined raid level categorization.

Well, md raid1 is always n-way. So if you use -n 3 and specify three devices, 
you'll get 3-way mirroring (3 mirrors). But I don't know any hardware raid that 
works this way. They all seem to be raid 1 is strictly two devices. At 4 
devices it's raid10, and only in pairs.

Btrfs raid1 with 3+ devices is unique as far as I can tell. It is something 
like raid1 (2 copies) + linear/concat. But that allocation is round robin. I 
don't read code but based on how a 3 disk raid1 volume grows VDI files as it's 
filled it looks like 1GB chunks are copied like this

Disk1   Disk2   Disk3
134 124 235
679 578 689

So 1 through 9 each represent a 1GB chunk. Disk 1 and 2 each have a chunk 1; 
disk 2 and 3 each have a chunk 2, and so on. Total of 9GB of data taking up 
18GB of space, 6GB on each drive. You can't do this with any other raid1 as far 
as I know. You do definitely run out of space on one disk first though because 
of uneven metadata to data chunk allocation.

Anyway I think we're off the rails with raid1 nomenclature as soon as we have 3 
devices. It's probably better to call it replication, with an assumed default 
of 2 replicates unless otherwise specified.

There's definitely a benefit to a 3 device volume with 2 replicates, efficiency 
wise. As soon as we go to four disks 2 replicates it makes more sense to do 
raid10, although I haven't tested odd device raid10 setups so I'm not sure what 
happens.


Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "csum failed" that was not detected by scrub

2014-05-02 Thread Jaap Pieroen
Shilong Wang  gmail.com> writes:

> 
> Hello,
> 
> There is a known RAID5/6 bug, i sent a patch to address this problem.
> Could you please double check if your kernel source includes the
> following commit:
> 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?
id=3b080b2564287be91605bfd1d5ee985696e61d3c
> 
> RAID5/6 should detect checksum mismatch, it can not fix errors now.
> 
> Thanks,
> Wang

Your patch seems to be in 3.15rc1:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.15-rc1-trusty/CHANGES

I tried rc3 but that made my system crash on boot.. I'm having bad luck

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re:

2014-05-02 Thread Jaap Pieroen
Duncan <1i5t5.duncan  cox.net> writes:

> 
> To those that know the details, this tells the story.
> 
> Btrfs raid5/6 modes are not yet code-complete, and scrub is one of the 
> incomplete bits.  btrfs scrub doesn't know how to deal with raid5/6 
> properly just yet.
> 
> While the operational bits of raid5/6 support are there, parity is 
> calculated and written, scrub, and recovery from a lost device, are not 
> yet code complete.  Thus, it's effectively a slower, lower capacity raid0 
> without scrub support at this point, except that when the code is 
> complete, you'll get an automatic "free" upgrade to full raid5 or raid6, 
> because the operational bits have been working since they were 
> introduced, just the recovery and scrub bits were bad, making it 
> effectively a raid0 in reliability terms, lose one and you've lost them 
> all.
> 
> That's the big picture anyway.  Marc Merlin recently did quite a bit of 
> raid5/6 testing and there's a page on the wiki now with what he found.  
> Additionally, I saw a scrub support for raid5/6 modes patch on the list 
> recently, but while it may be in integration, I believe it's too new to 
> have reached release yet.
> 
> Wiki, for memory or bookmark: https://btrfs.wiki.kernel.org
> 
> Direct user documentation link for bookmark (unwrap as necessary):
> 
> https://btrfs.wiki.kernel.org/index.php/
> Main_Page#Guides_and_usage_information
> 
> The raid5/6 page (which I didn't otherwise see conveniently linked, I dug 
> it out of the recent changes list since I knew it was there from on-list 
> discussion):
> 
> https://btrfs.wiki.kernel.org/index.php/RAID56
> 
>Marc or Hugo or someone with a wiki account:  Can this be more visibly 
> linked from the user-docs contents, added to the user docs category list, 
> and probably linked from at least the multiple devices and (for now) the 
> gotchas pages?
> 

So raid5 is much more useless than I assumed. I read Marc's blog and
figured that btrfs was ready enough.

I' really in trouble now. I tried to get rid of raid5 by doing a convert
balance to raid1. But of course this triggered the same issue. And now
I have a dead system because the first thing btrfs does after mounting
is continue the balance which will crash the system and send me into
a vicious loop.

- How can I stop btrfs from continuing balancing?
- How can I salvage this situation and convert to raid1?

Unfortunately I have little spare drives left. Not enough to contain
4.7TiB of data.. :(




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: do not increment on bio_index one by one

2014-05-02 Thread David Sterba
On Tue, Apr 29, 2014 at 01:07:58PM +0800, Liu Bo wrote:
> 'bio_index' is just a index, it's really not necessary to do increment
> one by one.
> 
> Signed-off-by: Liu Bo 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6 v2] Btrfs: add send_stream_version attribute to sysfs

2014-05-02 Thread Filipe David Manana
On Fri, May 2, 2014 at 4:46 PM, David Sterba  wrote:
> On Sun, Apr 20, 2014 at 10:40:03PM +0100, Filipe David Borba Manana wrote:
>> So that applications can find out what's the highest send stream
>> version supported/implemented by the running kernel:
>>
>> $ cat /sys/fs/btrfs/send/stream_version
>> 2
>>
>> Signed-off-by: Filipe David Borba Manana 
>> ---
>>
>> V2: Renamed /sys/fs/btrfs/send_stream_version to 
>> /sys/fs/btrfs/send/stream_version,
>> as in the future it might be useful to add other sysfs attrbutes related 
>> to
>> send (other ro information or tunables like internal buffer sizes, etc).
>
> Sounds good, I don't see any issue with the separate directory. Mixing
> it with /sys/fs/btrfs/features does not seem suitable for that if you
> intend adding more entries.

Yeah, I only didn't mix it with the features subdir because that
relates to features that are settable, plus there's 2 versions of it,
one global and one per fs (uuid) subdirectory (and it felt odd to me
to add it to one of those subdirs and not the other).

Thanks David

>
> Reviewed-by: David Sterba 



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6 v2] Btrfs: add send_stream_version attribute to sysfs

2014-05-02 Thread David Sterba
On Sun, Apr 20, 2014 at 10:40:03PM +0100, Filipe David Borba Manana wrote:
> So that applications can find out what's the highest send stream
> version supported/implemented by the running kernel:
> 
> $ cat /sys/fs/btrfs/send/stream_version
> 2
> 
> Signed-off-by: Filipe David Borba Manana 
> ---
> 
> V2: Renamed /sys/fs/btrfs/send_stream_version to 
> /sys/fs/btrfs/send/stream_version,
> as in the future it might be useful to add other sysfs attrbutes related 
> to
> send (other ro information or tunables like internal buffer sizes, etc).

Sounds good, I don't see any issue with the separate directory. Mixing
it with /sys/fs/btrfs/features does not seem suitable for that if you
intend adding more entries.

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] Btrfs-progs: send, bump stream version

2014-05-02 Thread David Sterba
On Tue, Apr 15, 2014 at 05:40:48PM +0100, Filipe David Borba Manana wrote:
> This increases the send stream version from version 1 to version 2, adding
> 2 new commands:
> 
> 1) total data size - used to tell the receiver how much file data the stream
>will add or update;
> 
> 2) fallocate - used to pre-allocate space for files and to punch holes in 
> files.
> 
> This is preparation work for subsequent changes that implement the new 
> features
> (computing total data size and use fallocate for better performance).
> 
> Signed-off-by: Filipe David Borba Manana 

The changes in the v2/3/4 look good, thanks.  Patches added to next
integratin.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/14] btrfs-progs: Print more info about device sizes

2014-05-02 Thread David Sterba
On Wed, Apr 30, 2014 at 02:37:16PM +0100, David Taylor wrote:
> It makes more sense to me than 'Occupied' and seems cleaner than
> 'Resized To'.  It sort of mirrors how LVM describes PV / VG / LV
> sizes, too.

Do you have a concrete example how we could map the btrfs sizes based on
LVM?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/14] btrfs-progs: Print more info about device sizes

2014-05-02 Thread David Sterba
On Wed, Apr 30, 2014 at 07:38:00PM +0200, Goffredo Baroncelli wrote:
> On 04/30/2014 03:37 PM, David Taylor wrote:
> > On Wed, 30 Apr 2014, Frank Kingswood wrote:
> >> On 30/04/14 13:11, David Sterba wrote:
> >>> On Wed, Apr 30, 2014 at 01:39:27PM +0200, Goffredo Baroncelli wrote:
> 
>  I found a bit unclear the "FS occupied" terms.
> >>>
> >>> We're running out of terms to describe and distinguish the space that
> >>> the filesystem uses.
> >>>
> >>> 'occupied' seemed like a good choice to me, though it may be not obvious
> >>
> >> The space that the filesystem uses in total seems to me is called the
> >> "size". It has nothing to do with utilization.
> >>
> >> /dev/sda6, ID: 2
> >> Device size:10.00GiB
> >> Filesystem size: 5.00GiB
> > 
> > FS size was what I was about to suggest, before I saw your reply.
> 
> Pay attention that this value is not the Filesystem size, 
> but to the maximum space the of THE DEVICE the filesystem is allowed to use.

I agree that plain 'Filesystem size' could be misleading, using the same
term that has an established meaning can cause misuderstandings in
bugreports.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "csum failed" that was not detected by scrub

2014-05-02 Thread Shilong Wang
Hello,


2014-05-02 17:42 GMT+08:00 Jaap Pieroen :
> Hi all,
>
> I completed a full scrub:
> root@nasbak:/home/jpieroen# btrfs scrub status /home/
> scrub status for 7ca5f38e-308f-43ab-b3ea-31b3bcd11a0d
> scrub started at Wed Apr 30 08:30:19 2014 and finished after 144131 seconds
> total bytes scrubbed: 4.76TiB with 0 errors
>
> Then tried to remove a device:
> root@nasbak:/home/jpieroen# btrfs device delete /dev/sdb /home
>
> This triggered bug_on, with the following error in dmesg: csum failed
> ino 258 off 1395560448 csum 2284440321 expected csum 319628859
>
> How can there still be csum failures directly after a scrub?
> If I rerun the scrub it still won't find any errors. I know this,
> because I've had the same issue 3 times in a row. Each time running a
> scrub and still being unable to remove the device.

There is a known RAID5/6 bug, i sent a patch to address this problem.
Could you please double check if your kernel source includes the
following commit:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3b080b2564287be91605bfd1d5ee985696e61d3c

RAID5/6 should detect checksum mismatch, it can not fix errors now.

Thanks,
Wang
>
> Kind Regards,
> Jaap
>
> --
> Details:
>
> root@nasbak:/home/jpieroen#   uname -a
> Linux nasbak 3.14.1-031401-generic #201404141220 SMP Mon Apr 14
> 16:21:48 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
>
> root@nasbak:/home/jpieroen#   btrfs --version
> Btrfs v3.14.1
>
> root@nasbak:/home/jpieroen#   btrfs fi df /home
> Data, RAID5: total=4.57TiB, used=4.55TiB
> System, RAID1: total=32.00MiB, used=352.00KiB
> Metadata, RAID1: total=7.00GiB, used=5.59GiB
>
> root@nasbak:/home/jpieroen# btrfs fi show
> Label: 'btrfs_storage'  uuid: 7ca5f38e-308f-43ab-b3ea-31b3bcd11a0d
> Total devices 6 FS bytes used 4.56TiB
> devid1 size 1.82TiB used 1.31TiB path /dev/sde
> devid2 size 1.82TiB used 1.31TiB path /dev/sdf
> devid3 size 1.82TiB used 1.31TiB path /dev/sdg
> devid4 size 931.51GiB used 25.00GiB path /dev/sdb
> devid6 size 2.73TiB used 994.03GiB path /dev/sdh
> devid7 size 2.73TiB used 994.03GiB path /dev/sdi
>
> Btrfs v3.14.1
>
> jpieroen@nasbak:~$ dmesg
> [227248.656438] BTRFS info (device sdi): relocating block group
> 9735225016320 flags 129
> [227261.713860] BTRFS info (device sdi): found 9 extents
> [227264.531019] BTRFS info (device sdi): found 9 extents
> [227265.011826] BTRFS info (device sdi): relocating block group
> 76265029632 flags 129
> [227274.052249] BTRFS info (device sdi): csum failed ino 258 off
> 1395560448 csum 2284440321 expected csum 319628859
> [227274.052354] BTRFS info (device sdi): csum failed ino 258 off
> 1395564544 csum 3646299263 expected csum 319628859
> [227274.052402] BTRFS info (device sdi): csum failed ino 258 off
> 1395568640 csum 281259278 expected csum 319628859
> [227274.052449] BTRFS info (device sdi): csum failed ino 258 off
> 1395572736 csum 2594807184 expected csum 319628859
> [227274.052492] BTRFS info (device sdi): csum failed ino 258 off
> 1395576832 csum 4288971971 expected csum 319628859
> [227274.052537] BTRFS info (device sdi): csum failed ino 258 off
> 1395580928 csum 752615894 expected csum 319628859
> [227274.052581] BTRFS info (device sdi): csum failed ino 258 off
> 1395585024 csum 3828951500 expected csum 319628859
> [227274.061279] [ cut here ]
> [227274.061354] kernel BUG at /home/apw/COD/linux/fs/btrfs/extent_io.c:2116!
> [227274.061445] invalid opcode:  [#1] SMP
> [227274.061509] Modules linked in: cuse deflate
> [227274.061573] BTRFS info (device sdi): csum failed ino 258 off
> 1395560448 csum 2284440321 expected csum 319628859
> [227274.061707]  ctr twofish_generic twofish_x86_64_3way
> twofish_x86_64 twofish_common camellia_generic camellia_x86_64
> serpent_sse2_x86_64 xts serpent_generic lrw gf128mul glue_helper
> blowfish_generic blowfish_x86_64 blowfish_common cast5_generic
> cast_common ablk_helper cryptd des_generic cmac xcbc rmd160
> crypto_null af_key xfrm_algo nfsd auth_rpcgss nfs_acl nfs lockd sunrpc
> fscache dm_crypt ip6t_REJECT ppdev xt_hl ip6t_rt nf_conntrack_ipv6
> nf_defrag_ipv6 ipt_REJECT xt_comment xt_LOG kvm xt_recent microcode
> xt_multiport xt_limit xt_tcpudp psmouse serio_raw xt_addrtype k10temp
> edac_core ipt_MASQUERADE edac_mce_amd iptable_nat nf_nat_ipv4
> sp5100_tco nf_conntrack_ipv4 nf_defrag_ipv4 ftdi_sio i2c_piix4
> usbserial xt_conntrack ip6table_filter ip6_tables joydev
> nf_conntrack_netbios_ns nf_conntrack_broadcast snd_hda_codec_via
> nf_nat_ftp snd_hda_codec_hdmi nf_nat snd_hda_codec_generic
> nf_conntrack_ftp nf_conntrack snd_hda_intel iptable_filter
> ir_lirc_codec(OF) lirc_dev(OF) ip_tables snd_hda_codec
> ir_mce_kbd_decoder(OF) x_tables snd_hwdep ir_sony_decoder(OF)
> rc_tbs_nec(OF) ir_jvc_decoder(OF) snd_pcm ir_rc6_decoder(OF)
> ir_rc5_decoder(OF) saa716x_tbs_dvb(OF) tbs6982fe(POF) tbs6680fe(POF)
> ir_nec_decoder(OF) tbs6923fe(POF) tbs6985se(POF) t

Re: "csum failed" that was not detected by scrub

2014-05-02 Thread Duncan
Jaap Pieroen posted on Fri, 02 May 2014 11:42:35 +0200 as excerpted:

> I completed a full scrub:
> root@nasbak:/home/jpieroen# btrfs scrub status /home/
> scrub status for 7ca5f38e-308f-43ab-b3ea-31b3bcd11a0d
> scrub started at Wed Apr 30 08:30:19 2014
> and finished after 144131 seconds
> total bytes scrubbed: 4.76TiB with 0 errors
> 
> Then tried to remove a device:
> root@nasbak:/home/jpieroen# btrfs device delete /dev/sdb /home
> 
> This triggered bug_on, with the following error in dmesg: csum failed
> ino 258 off 1395560448 csum 2284440321 expected csum 319628859
> 
> How can there still be csum failures directly after a scrub?

Simple enough, really...

> root@nasbak:/home/jpieroen#   btrfs fi df /home
> Data, RAID5: total=4.57TiB, used=4.55TiB
> System, RAID1: total=32.00MiB, used=352.00KiB
> Metadata, RAID1: total=7.00GiB, used=5.59GiB

To those that know the details, this tells the story.

Btrfs raid5/6 modes are not yet code-complete, and scrub is one of the 
incomplete bits.  btrfs scrub doesn't know how to deal with raid5/6 
properly just yet.

While the operational bits of raid5/6 support are there, parity is 
calculated and written, scrub, and recovery from a lost device, are not 
yet code complete.  Thus, it's effectively a slower, lower capacity raid0 
without scrub support at this point, except that when the code is 
complete, you'll get an automatic "free" upgrade to full raid5 or raid6, 
because the operational bits have been working since they were 
introduced, just the recovery and scrub bits were bad, making it 
effectively a raid0 in reliability terms, lose one and you've lost them 
all.

That's the big picture anyway.  Marc Merlin recently did quite a bit of 
raid5/6 testing and there's a page on the wiki now with what he found.  
Additionally, I saw a scrub support for raid5/6 modes patch on the list 
recently, but while it may be in integration, I believe it's too new to 
have reached release yet.

Wiki, for memory or bookmark: https://btrfs.wiki.kernel.org

Direct user documentation link for bookmark (unwrap as necessary):

https://btrfs.wiki.kernel.org/index.php/
Main_Page#Guides_and_usage_information

The raid5/6 page (which I didn't otherwise see conveniently linked, I dug 
it out of the recent changes list since I knew it was there from on-list 
discussion):

https://btrfs.wiki.kernel.org/index.php/RAID56


@ Marc or Hugo or someone with a wiki account:  Can this be more visibly 
linked from the user-docs contents, added to the user docs category list, 
and probably linked from at least the multiple devices and (for now) the 
gotchas pages?

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Unable to boot

2014-05-02 Thread George Pochiscan
Hello,

I have a problem with a server with Fedora 20 and BTRFS. This server had 
frequent hard restarts before the filesystem got corrupt and we are unable to 
boot it.

We have a HP Proliant server with 4 disks @1TB each and Software RAID 5.
It had Debian installed (i don't know the version) and right now i'm using 
fedora 20 live to try to rescue the  system.

When we try btrfsck /dev/md127 i have a lot of checksum errors, and the output 
is: 

Checking filesystem on /dev/md127
UUID: e068faf0-2c16-4566-9093-e6d1e21a5e3c
checking extents
checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
Csum didn't match
checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
Csum didn't match
-

extent buffer leak: start 1006686208 len 4096
found 32039247396 bytes used err is -22
total csum bytes: 41608612
total tree bytes: 388857856
total fs tree bytes: 310124544
total extent tree bytes: 22016000
btree space waste bytes: 126431234
file data blocks allocated: 47227326464
 referenced 42595635200
Btrfs v3.12



When i attempt to repair i have the following error:
-
Backref 1005817856 parent 5 root 5 not found in extent tree
backpointer mismatch on [1005817856 4096]
owner ref check failed [1006686208 4096]
repaired damaged extent references
Failed to find [1000525824, 168, 4096]
btrfs unable to find ref byte nr 1000525824 parent 0 root 1  owner 1 offset 0
btrfsck: extent-tree.c:1752: write_one_cache_group: Assertion `!(ret)' failed.
Aborted





I have installed btrfs version 3.12

Linux localhost 3.11.10-301.fc20.x86_64 #1 SMP Thu Dec 5 14:01:17 UTC 2013 
x86_64 x86_64 x86_64 GNU/Linux

[root@localhost liveuser]# btrfs fi show
Label: none  uuid: e068faf0-2c16-4566-9093-e6d1e21a5e3c
Total devices 1 FS bytes used 40.04GiB
devid1 size 1.82TiB used 43.04GiB path /dev/md127
Btrfs v3.12


Please advice.

Thank you,
George Pochiscan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


"csum failed" that was not detected by scrub

2014-05-02 Thread Jaap Pieroen
Hi all,

I completed a full scrub:
root@nasbak:/home/jpieroen# btrfs scrub status /home/
scrub status for 7ca5f38e-308f-43ab-b3ea-31b3bcd11a0d
scrub started at Wed Apr 30 08:30:19 2014 and finished after 144131 seconds
total bytes scrubbed: 4.76TiB with 0 errors

Then tried to remove a device:
root@nasbak:/home/jpieroen# btrfs device delete /dev/sdb /home

This triggered bug_on, with the following error in dmesg: csum failed
ino 258 off 1395560448 csum 2284440321 expected csum 319628859

How can there still be csum failures directly after a scrub?
If I rerun the scrub it still won't find any errors. I know this,
because I've had the same issue 3 times in a row. Each time running a
scrub and still being unable to remove the device.

Kind Regards,
Jaap

--
Details:

root@nasbak:/home/jpieroen#   uname -a
Linux nasbak 3.14.1-031401-generic #201404141220 SMP Mon Apr 14
16:21:48 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

root@nasbak:/home/jpieroen#   btrfs --version
Btrfs v3.14.1

root@nasbak:/home/jpieroen#   btrfs fi df /home
Data, RAID5: total=4.57TiB, used=4.55TiB
System, RAID1: total=32.00MiB, used=352.00KiB
Metadata, RAID1: total=7.00GiB, used=5.59GiB

root@nasbak:/home/jpieroen# btrfs fi show
Label: 'btrfs_storage'  uuid: 7ca5f38e-308f-43ab-b3ea-31b3bcd11a0d
Total devices 6 FS bytes used 4.56TiB
devid1 size 1.82TiB used 1.31TiB path /dev/sde
devid2 size 1.82TiB used 1.31TiB path /dev/sdf
devid3 size 1.82TiB used 1.31TiB path /dev/sdg
devid4 size 931.51GiB used 25.00GiB path /dev/sdb
devid6 size 2.73TiB used 994.03GiB path /dev/sdh
devid7 size 2.73TiB used 994.03GiB path /dev/sdi

Btrfs v3.14.1

jpieroen@nasbak:~$ dmesg
[227248.656438] BTRFS info (device sdi): relocating block group
9735225016320 flags 129
[227261.713860] BTRFS info (device sdi): found 9 extents
[227264.531019] BTRFS info (device sdi): found 9 extents
[227265.011826] BTRFS info (device sdi): relocating block group
76265029632 flags 129
[227274.052249] BTRFS info (device sdi): csum failed ino 258 off
1395560448 csum 2284440321 expected csum 319628859
[227274.052354] BTRFS info (device sdi): csum failed ino 258 off
1395564544 csum 3646299263 expected csum 319628859
[227274.052402] BTRFS info (device sdi): csum failed ino 258 off
1395568640 csum 281259278 expected csum 319628859
[227274.052449] BTRFS info (device sdi): csum failed ino 258 off
1395572736 csum 2594807184 expected csum 319628859
[227274.052492] BTRFS info (device sdi): csum failed ino 258 off
1395576832 csum 4288971971 expected csum 319628859
[227274.052537] BTRFS info (device sdi): csum failed ino 258 off
1395580928 csum 752615894 expected csum 319628859
[227274.052581] BTRFS info (device sdi): csum failed ino 258 off
1395585024 csum 3828951500 expected csum 319628859
[227274.061279] [ cut here ]
[227274.061354] kernel BUG at /home/apw/COD/linux/fs/btrfs/extent_io.c:2116!
[227274.061445] invalid opcode:  [#1] SMP
[227274.061509] Modules linked in: cuse deflate
[227274.061573] BTRFS info (device sdi): csum failed ino 258 off
1395560448 csum 2284440321 expected csum 319628859
[227274.061707]  ctr twofish_generic twofish_x86_64_3way
twofish_x86_64 twofish_common camellia_generic camellia_x86_64
serpent_sse2_x86_64 xts serpent_generic lrw gf128mul glue_helper
blowfish_generic blowfish_x86_64 blowfish_common cast5_generic
cast_common ablk_helper cryptd des_generic cmac xcbc rmd160
crypto_null af_key xfrm_algo nfsd auth_rpcgss nfs_acl nfs lockd sunrpc
fscache dm_crypt ip6t_REJECT ppdev xt_hl ip6t_rt nf_conntrack_ipv6
nf_defrag_ipv6 ipt_REJECT xt_comment xt_LOG kvm xt_recent microcode
xt_multiport xt_limit xt_tcpudp psmouse serio_raw xt_addrtype k10temp
edac_core ipt_MASQUERADE edac_mce_amd iptable_nat nf_nat_ipv4
sp5100_tco nf_conntrack_ipv4 nf_defrag_ipv4 ftdi_sio i2c_piix4
usbserial xt_conntrack ip6table_filter ip6_tables joydev
nf_conntrack_netbios_ns nf_conntrack_broadcast snd_hda_codec_via
nf_nat_ftp snd_hda_codec_hdmi nf_nat snd_hda_codec_generic
nf_conntrack_ftp nf_conntrack snd_hda_intel iptable_filter
ir_lirc_codec(OF) lirc_dev(OF) ip_tables snd_hda_codec
ir_mce_kbd_decoder(OF) x_tables snd_hwdep ir_sony_decoder(OF)
rc_tbs_nec(OF) ir_jvc_decoder(OF) snd_pcm ir_rc6_decoder(OF)
ir_rc5_decoder(OF) saa716x_tbs_dvb(OF) tbs6982fe(POF) tbs6680fe(POF)
ir_nec_decoder(OF) tbs6923fe(POF) tbs6985se(POF) tbs6928se(POF)
tbs6982se(POF) tbs6991fe(POF) tbs6618fe(POF) saa716x_core(OF)
tbs6922fe(POF) tbs6928fe(POF) tbs6991se(POF) stv090x(OF) dvb_core(OF)
rc_core(OF) snd_timer snd soundcore asus_atk0110 parport_pc shpchp
mac_hid lp parport btrfs xor raid6_pq pata_acpi hid_generic usbhid hid
usb_storage radeon pata_atiixp r8169 mii i2c_algo_bit sata_sil24 ttm
drm_kms_helper drm ahci libahci wmi
[227274.064118] CPU: 1 PID: 15543 Comm: btrfs-endio-4 Tainted: PF
O 3.14.1-031401-generic #201404141220
[227274.064246] Hardware name: System manufacturer System Product
Name/M4A78LT-M, BIOS 

Re: Help with space

2014-05-02 Thread Brendan Hide

On 02/05/14 10:23, Duncan wrote:

Russell Coker posted on Fri, 02 May 2014 11:48:07 +1000 as excerpted:


On Thu, 1 May 2014, Duncan <1i5t5.dun...@cox.net> wrote:
[snip]
http://www.eecs.berkeley.edu/Pubs/TechRpts/1987/CSD-87-391.pdf‎

Whether a true RAID-1 means just 2 copies or N copies is a matter of
opinion. Papers such as the above seem to clearly imply that RAID-1 is
strictly 2 copies of data.

Thanks for that link. =:^)

My position would be that reflects the original, but not the modern,
definition.  The paper seems to describe as raid1 what would later come
to be called raid1+0, which quickly morphed into raid10, leaving the
raid1 description only covering pure mirror-raid.
Personally I'm flexible on using the terminology in day-to-day 
operations and discussion due to the fact that the end-result is "close 
enough". But ...


The definition of "RAID 1" is still only a mirror of two devices. As far 
as I'm aware, Linux's mdraid is the only raid system in the world that 
allows N-way mirroring while still referring to it as "RAID1". Due to 
the way it handles data in chunks, and also due to its "rampant layering 
violations", *technically* btrfs's "RAID-like" features are not "RAID".


To differentiate from "RAID", we're already using lowercase "raid" and, 
in the long term, some of us are also looking to do away with "raid{x}" 
terms altogether with what Hugo and I last termed as "csp notation". 
Changing the terminology is important - but it is particularly non-urgent.


--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Negative qgroup sizes

2014-05-02 Thread Alin Dobre
Thanks for the response, Duncan.

On 01/05/14 17:58, Duncan wrote:
> 
> Tho you are slightly outdated on your btrfs-progs version, 3.14.1 being 
> current.  But I think the code in question is kernel code and the progs 
> simply report it, so I don't think that can be the problem in this case.

Yes, I'm aware that 3.14 version of btrfs progs was already there, but
this is just for couple of weeks and I'm pretty sure that the kernel
code (which does the real time accounting) is broken.

> So if you are doing snapshots, you can try not doing them (switching to 
> conventional backup if necessary) and see if that stabilizes your 
> numbers.  If so, you know there's still more problems in that area.
> 
> Of course if the subvolumes involved aren't snapshotted, then the problem 
> must be elsewhere, but I do know the snapshotting case /is/ reasonably 
> difficult to get right... while staying within a reasonable performance 
> envelope at least.
> 

I have already searched and found some patches around this issue, but I
thought I'd also mention the issue on this mailing list and hoped that I
somehow missed something. The subvolumes are highly probable to be
snapshotted, so this might indeed be the case.

Cheers,
Alin.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-05-02 Thread Duncan
Russell Coker posted on Fri, 02 May 2014 11:48:07 +1000 as excerpted:

> On Thu, 1 May 2014, Duncan <1i5t5.dun...@cox.net> wrote:
> 
> Am I missing something or is it impossible to do a disk replace on BTRFS
> right now?
> 
> I can delete a device, I can add a device, but I'd like to replace a
> device.

You're missing something... but it's easy to do as I almost missed it too 
even tho I was sure it was there.

Something tells me btrfs replace (not device replace, simply replace) 
should be moved to btrfs device replace...

> http://www.eecs.berkeley.edu/Pubs/TechRpts/1987/CSD-87-391.pdf‎
> 
> Whether a true RAID-1 means just 2 copies or N copies is a matter of
> opinion. Papers such as the above seem to clearly imply that RAID-1 is
> strictly 2 copies of data.

Thanks for that link. =:^)

My position would be that reflects the original, but not the modern, 
definition.  The paper seems to describe as raid1 what would later come 
to be called raid1+0, which quickly morphed into raid10, leaving the 
raid1 description only covering pure mirror-raid.

And even then, the paper says mirrors in spots without specifically 
defining it as (only) two mirrors, but in others it seems to /assume/, 
without further explanation, just two mirrors.  So I'd argue that even 
then the definition of raid1 allowed more than two mirrors, but that it 
just so happened that the examples and formulae given dealt with only two 
mirrors.

Tho certainly I can see the room for differing opinions on the matter as 
well.

> I don't have a strong opinion on how many copies of data can be involved
> in a RAID-1, but I think that there's no good case to claim that only 2
> copies means that something isn't "true RAID-1".

Well, I'd say two copies if it's only two devices in the raid1... would 
be true raid1.  But if it's say four devices in the raid1, as is 
certainly possible with btrfs raid1, that if it's not mirrored 4-way 
across all devices, it's not true raid1, but rather some sort of hybrid 
raid,  raid10 (or raid01) if the devices are so arranged, raid1+linear if 
arranged that way, or some form that doesn't nicely fall into a well 
defined raid level categorization.

But still, opinions can differ.  Point well made... and taken. =:^)

>> Surprisingly, after shutting everything down, getting a new AC, and
>> letting the system cool for a few hours, it pretty much all came back
>> to life, including the CPU(s) (that was pre-multi-core, but I don't
>> remember whether it was my dual socket original Opteron, or
>> pre-dual-socket for me as well) which I had feared would be dead.
> 
> CPUs have had thermal shutdown for a long time.  When a CPU lacks such
> controls (as some buggy Opteron chips did a few years ago) it makes the
> IT news.

That was certainly some years ago, and I remember for awhile, AMD Athlons 
didn't have thermal shutdown yet, while Intel CPUs of the time did.  And 
that was an AMD CPU as I've run mostly AMD (with only specific 
exceptions) for literally decades, now.  But what IDR for sure is whether 
it was my original AMD Athlon (500 MHz), or the Athlon C @ 1.2 GHz, or 
the dual Opteron 242s I ran for several years.  If it was the original 
Athlon, it wouldn't have had thermal shutdown.  If it was the Opterons I 
think they did, but I think the Athlon Cs were in the period when Intel 
had introduced thermal shutdown but AMD hadn't, and Tom's Hardware among 
others had dramatic videos of just exactly what happened if one actually 
tried to run the things without cooling, compared to running an Intel of 
the period.

But I remember being rather surprised that the CPU(s) was/were unharmed, 
which means it very well could have been the Athlon C era, and I had seen 
the dramatic videos and knew my CPU wasn't protected.

> I'd like to be able to run a combination of "dup" and RAID-1 for
> metadata. ZFS has a "copies" option, it would be good if we could do
> that.

Well, if N-way-mirroring were possible, one could do more or less just 
that easily enough with suitable partitioning and setting the data vs 
metadata number of mirrors as appropriate... but of course with only two-
way-mirroring and dup as choices... the only way to do it would be 
layering btrfs atop something else, say md/raid.  And without real-time 
checksumming verification at the md/raid level...

> I use BTRFS for all my backups too.  I think that the chance of data
> patterns triggering filesystem bugs that break backups as well as
> primary storage is vanishingly small.  The chance of such bugs being
> latent for long enough that I can't easily recreate the data isn't worth
> worrying about.

The fact that my primary filesystems and their first backups are btrfs 
raid1 on dual SSDs, while secondary backups are on spinning rust, does 
factor into my calculations here.

I ran reiserfs for many years, since I first switched to Linux full time 
in the early kernel 2.4 era in fact, and while it had its problems early 
on, since the introduction of orde