Re: Help with space

2014-05-02 Thread Duncan
Russell Coker posted on Fri, 02 May 2014 11:48:07 +1000 as excerpted:

 On Thu, 1 May 2014, Duncan 1i5t5.dun...@cox.net wrote:
 
 Am I missing something or is it impossible to do a disk replace on BTRFS
 right now?
 
 I can delete a device, I can add a device, but I'd like to replace a
 device.

You're missing something... but it's easy to do as I almost missed it too 
even tho I was sure it was there.

Something tells me btrfs replace (not device replace, simply replace) 
should be moved to btrfs device replace...

 http://www.eecs.berkeley.edu/Pubs/TechRpts/1987/CSD-87-391.pdf‎
 
 Whether a true RAID-1 means just 2 copies or N copies is a matter of
 opinion. Papers such as the above seem to clearly imply that RAID-1 is
 strictly 2 copies of data.

Thanks for that link. =:^)

My position would be that reflects the original, but not the modern, 
definition.  The paper seems to describe as raid1 what would later come 
to be called raid1+0, which quickly morphed into raid10, leaving the 
raid1 description only covering pure mirror-raid.

And even then, the paper says mirrors in spots without specifically 
defining it as (only) two mirrors, but in others it seems to /assume/, 
without further explanation, just two mirrors.  So I'd argue that even 
then the definition of raid1 allowed more than two mirrors, but that it 
just so happened that the examples and formulae given dealt with only two 
mirrors.

Tho certainly I can see the room for differing opinions on the matter as 
well.

 I don't have a strong opinion on how many copies of data can be involved
 in a RAID-1, but I think that there's no good case to claim that only 2
 copies means that something isn't true RAID-1.

Well, I'd say two copies if it's only two devices in the raid1... would 
be true raid1.  But if it's say four devices in the raid1, as is 
certainly possible with btrfs raid1, that if it's not mirrored 4-way 
across all devices, it's not true raid1, but rather some sort of hybrid 
raid,  raid10 (or raid01) if the devices are so arranged, raid1+linear if 
arranged that way, or some form that doesn't nicely fall into a well 
defined raid level categorization.

But still, opinions can differ.  Point well made... and taken. =:^)

 Surprisingly, after shutting everything down, getting a new AC, and
 letting the system cool for a few hours, it pretty much all came back
 to life, including the CPU(s) (that was pre-multi-core, but I don't
 remember whether it was my dual socket original Opteron, or
 pre-dual-socket for me as well) which I had feared would be dead.
 
 CPUs have had thermal shutdown for a long time.  When a CPU lacks such
 controls (as some buggy Opteron chips did a few years ago) it makes the
 IT news.

That was certainly some years ago, and I remember for awhile, AMD Athlons 
didn't have thermal shutdown yet, while Intel CPUs of the time did.  And 
that was an AMD CPU as I've run mostly AMD (with only specific 
exceptions) for literally decades, now.  But what IDR for sure is whether 
it was my original AMD Athlon (500 MHz), or the Athlon C @ 1.2 GHz, or 
the dual Opteron 242s I ran for several years.  If it was the original 
Athlon, it wouldn't have had thermal shutdown.  If it was the Opterons I 
think they did, but I think the Athlon Cs were in the period when Intel 
had introduced thermal shutdown but AMD hadn't, and Tom's Hardware among 
others had dramatic videos of just exactly what happened if one actually 
tried to run the things without cooling, compared to running an Intel of 
the period.

But I remember being rather surprised that the CPU(s) was/were unharmed, 
which means it very well could have been the Athlon C era, and I had seen 
the dramatic videos and knew my CPU wasn't protected.

 I'd like to be able to run a combination of dup and RAID-1 for
 metadata. ZFS has a copies option, it would be good if we could do
 that.

Well, if N-way-mirroring were possible, one could do more or less just 
that easily enough with suitable partitioning and setting the data vs 
metadata number of mirrors as appropriate... but of course with only two-
way-mirroring and dup as choices... the only way to do it would be 
layering btrfs atop something else, say md/raid.  And without real-time 
checksumming verification at the md/raid level...

 I use BTRFS for all my backups too.  I think that the chance of data
 patterns triggering filesystem bugs that break backups as well as
 primary storage is vanishingly small.  The chance of such bugs being
 latent for long enough that I can't easily recreate the data isn't worth
 worrying about.

The fact that my primary filesystems and their first backups are btrfs 
raid1 on dual SSDs, while secondary backups are on spinning rust, does 
factor into my calculations here.

I ran reiserfs for many years, since I first switched to Linux full time 
in the early kernel 2.4 era in fact, and while it had its problems early 
on, since the introduction of ordered data mode in IIRC 2.6.16 or some 
such, 

Re: Negative qgroup sizes

2014-05-02 Thread Alin Dobre
Thanks for the response, Duncan.

On 01/05/14 17:58, Duncan wrote:
 
 Tho you are slightly outdated on your btrfs-progs version, 3.14.1 being 
 current.  But I think the code in question is kernel code and the progs 
 simply report it, so I don't think that can be the problem in this case.

Yes, I'm aware that 3.14 version of btrfs progs was already there, but
this is just for couple of weeks and I'm pretty sure that the kernel
code (which does the real time accounting) is broken.

 So if you are doing snapshots, you can try not doing them (switching to 
 conventional backup if necessary) and see if that stabilizes your 
 numbers.  If so, you know there's still more problems in that area.
 
 Of course if the subvolumes involved aren't snapshotted, then the problem 
 must be elsewhere, but I do know the snapshotting case /is/ reasonably 
 difficult to get right... while staying within a reasonable performance 
 envelope at least.
 

I have already searched and found some patches around this issue, but I
thought I'd also mention the issue on this mailing list and hoped that I
somehow missed something. The subvolumes are highly probable to be
snapshotted, so this might indeed be the case.

Cheers,
Alin.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-05-02 Thread Brendan Hide

On 02/05/14 10:23, Duncan wrote:

Russell Coker posted on Fri, 02 May 2014 11:48:07 +1000 as excerpted:


On Thu, 1 May 2014, Duncan 1i5t5.dun...@cox.net wrote:
[snip]
http://www.eecs.berkeley.edu/Pubs/TechRpts/1987/CSD-87-391.pdf‎

Whether a true RAID-1 means just 2 copies or N copies is a matter of
opinion. Papers such as the above seem to clearly imply that RAID-1 is
strictly 2 copies of data.

Thanks for that link. =:^)

My position would be that reflects the original, but not the modern,
definition.  The paper seems to describe as raid1 what would later come
to be called raid1+0, which quickly morphed into raid10, leaving the
raid1 description only covering pure mirror-raid.
Personally I'm flexible on using the terminology in day-to-day 
operations and discussion due to the fact that the end-result is close 
enough. But ...


The definition of RAID 1 is still only a mirror of two devices. As far 
as I'm aware, Linux's mdraid is the only raid system in the world that 
allows N-way mirroring while still referring to it as RAID1. Due to 
the way it handles data in chunks, and also due to its rampant layering 
violations, *technically* btrfs's RAID-like features are not RAID.


To differentiate from RAID, we're already using lowercase raid and, 
in the long term, some of us are also looking to do away with raid{x} 
terms altogether with what Hugo and I last termed as csp notation. 
Changing the terminology is important - but it is particularly non-urgent.


--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


csum failed that was not detected by scrub

2014-05-02 Thread Jaap Pieroen
Hi all,

I completed a full scrub:
root@nasbak:/home/jpieroen# btrfs scrub status /home/
scrub status for 7ca5f38e-308f-43ab-b3ea-31b3bcd11a0d
scrub started at Wed Apr 30 08:30:19 2014 and finished after 144131 seconds
total bytes scrubbed: 4.76TiB with 0 errors

Then tried to remove a device:
root@nasbak:/home/jpieroen# btrfs device delete /dev/sdb /home

This triggered bug_on, with the following error in dmesg: csum failed
ino 258 off 1395560448 csum 2284440321 expected csum 319628859

How can there still be csum failures directly after a scrub?
If I rerun the scrub it still won't find any errors. I know this,
because I've had the same issue 3 times in a row. Each time running a
scrub and still being unable to remove the device.

Kind Regards,
Jaap

--
Details:

root@nasbak:/home/jpieroen#   uname -a
Linux nasbak 3.14.1-031401-generic #201404141220 SMP Mon Apr 14
16:21:48 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

root@nasbak:/home/jpieroen#   btrfs --version
Btrfs v3.14.1

root@nasbak:/home/jpieroen#   btrfs fi df /home
Data, RAID5: total=4.57TiB, used=4.55TiB
System, RAID1: total=32.00MiB, used=352.00KiB
Metadata, RAID1: total=7.00GiB, used=5.59GiB

root@nasbak:/home/jpieroen# btrfs fi show
Label: 'btrfs_storage'  uuid: 7ca5f38e-308f-43ab-b3ea-31b3bcd11a0d
Total devices 6 FS bytes used 4.56TiB
devid1 size 1.82TiB used 1.31TiB path /dev/sde
devid2 size 1.82TiB used 1.31TiB path /dev/sdf
devid3 size 1.82TiB used 1.31TiB path /dev/sdg
devid4 size 931.51GiB used 25.00GiB path /dev/sdb
devid6 size 2.73TiB used 994.03GiB path /dev/sdh
devid7 size 2.73TiB used 994.03GiB path /dev/sdi

Btrfs v3.14.1

jpieroen@nasbak:~$ dmesg
[227248.656438] BTRFS info (device sdi): relocating block group
9735225016320 flags 129
[227261.713860] BTRFS info (device sdi): found 9 extents
[227264.531019] BTRFS info (device sdi): found 9 extents
[227265.011826] BTRFS info (device sdi): relocating block group
76265029632 flags 129
[227274.052249] BTRFS info (device sdi): csum failed ino 258 off
1395560448 csum 2284440321 expected csum 319628859
[227274.052354] BTRFS info (device sdi): csum failed ino 258 off
1395564544 csum 3646299263 expected csum 319628859
[227274.052402] BTRFS info (device sdi): csum failed ino 258 off
1395568640 csum 281259278 expected csum 319628859
[227274.052449] BTRFS info (device sdi): csum failed ino 258 off
1395572736 csum 2594807184 expected csum 319628859
[227274.052492] BTRFS info (device sdi): csum failed ino 258 off
1395576832 csum 4288971971 expected csum 319628859
[227274.052537] BTRFS info (device sdi): csum failed ino 258 off
1395580928 csum 752615894 expected csum 319628859
[227274.052581] BTRFS info (device sdi): csum failed ino 258 off
1395585024 csum 3828951500 expected csum 319628859
[227274.061279] [ cut here ]
[227274.061354] kernel BUG at /home/apw/COD/linux/fs/btrfs/extent_io.c:2116!
[227274.061445] invalid opcode:  [#1] SMP
[227274.061509] Modules linked in: cuse deflate
[227274.061573] BTRFS info (device sdi): csum failed ino 258 off
1395560448 csum 2284440321 expected csum 319628859
[227274.061707]  ctr twofish_generic twofish_x86_64_3way
twofish_x86_64 twofish_common camellia_generic camellia_x86_64
serpent_sse2_x86_64 xts serpent_generic lrw gf128mul glue_helper
blowfish_generic blowfish_x86_64 blowfish_common cast5_generic
cast_common ablk_helper cryptd des_generic cmac xcbc rmd160
crypto_null af_key xfrm_algo nfsd auth_rpcgss nfs_acl nfs lockd sunrpc
fscache dm_crypt ip6t_REJECT ppdev xt_hl ip6t_rt nf_conntrack_ipv6
nf_defrag_ipv6 ipt_REJECT xt_comment xt_LOG kvm xt_recent microcode
xt_multiport xt_limit xt_tcpudp psmouse serio_raw xt_addrtype k10temp
edac_core ipt_MASQUERADE edac_mce_amd iptable_nat nf_nat_ipv4
sp5100_tco nf_conntrack_ipv4 nf_defrag_ipv4 ftdi_sio i2c_piix4
usbserial xt_conntrack ip6table_filter ip6_tables joydev
nf_conntrack_netbios_ns nf_conntrack_broadcast snd_hda_codec_via
nf_nat_ftp snd_hda_codec_hdmi nf_nat snd_hda_codec_generic
nf_conntrack_ftp nf_conntrack snd_hda_intel iptable_filter
ir_lirc_codec(OF) lirc_dev(OF) ip_tables snd_hda_codec
ir_mce_kbd_decoder(OF) x_tables snd_hwdep ir_sony_decoder(OF)
rc_tbs_nec(OF) ir_jvc_decoder(OF) snd_pcm ir_rc6_decoder(OF)
ir_rc5_decoder(OF) saa716x_tbs_dvb(OF) tbs6982fe(POF) tbs6680fe(POF)
ir_nec_decoder(OF) tbs6923fe(POF) tbs6985se(POF) tbs6928se(POF)
tbs6982se(POF) tbs6991fe(POF) tbs6618fe(POF) saa716x_core(OF)
tbs6922fe(POF) tbs6928fe(POF) tbs6991se(POF) stv090x(OF) dvb_core(OF)
rc_core(OF) snd_timer snd soundcore asus_atk0110 parport_pc shpchp
mac_hid lp parport btrfs xor raid6_pq pata_acpi hid_generic usbhid hid
usb_storage radeon pata_atiixp r8169 mii i2c_algo_bit sata_sil24 ttm
drm_kms_helper drm ahci libahci wmi
[227274.064118] CPU: 1 PID: 15543 Comm: btrfs-endio-4 Tainted: PF
O 3.14.1-031401-generic #201404141220
[227274.064246] Hardware name: System manufacturer System Product
Name/M4A78LT-M, BIOS 

Unable to boot

2014-05-02 Thread George Pochiscan
Hello,

I have a problem with a server with Fedora 20 and BTRFS. This server had 
frequent hard restarts before the filesystem got corrupt and we are unable to 
boot it.

We have a HP Proliant server with 4 disks @1TB each and Software RAID 5.
It had Debian installed (i don't know the version) and right now i'm using 
fedora 20 live to try to rescue the  system.

When we try btrfsck /dev/md127 i have a lot of checksum errors, and the output 
is: 

Checking filesystem on /dev/md127
UUID: e068faf0-2c16-4566-9093-e6d1e21a5e3c
checking extents
checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
Csum didn't match
checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
Csum didn't match
-

extent buffer leak: start 1006686208 len 4096
found 32039247396 bytes used err is -22
total csum bytes: 41608612
total tree bytes: 388857856
total fs tree bytes: 310124544
total extent tree bytes: 22016000
btree space waste bytes: 126431234
file data blocks allocated: 47227326464
 referenced 42595635200
Btrfs v3.12



When i attempt to repair i have the following error:
-
Backref 1005817856 parent 5 root 5 not found in extent tree
backpointer mismatch on [1005817856 4096]
owner ref check failed [1006686208 4096]
repaired damaged extent references
Failed to find [1000525824, 168, 4096]
btrfs unable to find ref byte nr 1000525824 parent 0 root 1  owner 1 offset 0
btrfsck: extent-tree.c:1752: write_one_cache_group: Assertion `!(ret)' failed.
Aborted





I have installed btrfs version 3.12

Linux localhost 3.11.10-301.fc20.x86_64 #1 SMP Thu Dec 5 14:01:17 UTC 2013 
x86_64 x86_64 x86_64 GNU/Linux

[root@localhost liveuser]# btrfs fi show
Label: none  uuid: e068faf0-2c16-4566-9093-e6d1e21a5e3c
Total devices 1 FS bytes used 40.04GiB
devid1 size 1.82TiB used 43.04GiB path /dev/md127
Btrfs v3.12


Please advice.

Thank you,
George Pochiscan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: csum failed that was not detected by scrub

2014-05-02 Thread Duncan
Jaap Pieroen posted on Fri, 02 May 2014 11:42:35 +0200 as excerpted:

 I completed a full scrub:
 root@nasbak:/home/jpieroen# btrfs scrub status /home/
 scrub status for 7ca5f38e-308f-43ab-b3ea-31b3bcd11a0d
 scrub started at Wed Apr 30 08:30:19 2014
 and finished after 144131 seconds
 total bytes scrubbed: 4.76TiB with 0 errors
 
 Then tried to remove a device:
 root@nasbak:/home/jpieroen# btrfs device delete /dev/sdb /home
 
 This triggered bug_on, with the following error in dmesg: csum failed
 ino 258 off 1395560448 csum 2284440321 expected csum 319628859
 
 How can there still be csum failures directly after a scrub?

Simple enough, really...

 root@nasbak:/home/jpieroen#   btrfs fi df /home
 Data, RAID5: total=4.57TiB, used=4.55TiB
 System, RAID1: total=32.00MiB, used=352.00KiB
 Metadata, RAID1: total=7.00GiB, used=5.59GiB

To those that know the details, this tells the story.

Btrfs raid5/6 modes are not yet code-complete, and scrub is one of the 
incomplete bits.  btrfs scrub doesn't know how to deal with raid5/6 
properly just yet.

While the operational bits of raid5/6 support are there, parity is 
calculated and written, scrub, and recovery from a lost device, are not 
yet code complete.  Thus, it's effectively a slower, lower capacity raid0 
without scrub support at this point, except that when the code is 
complete, you'll get an automatic free upgrade to full raid5 or raid6, 
because the operational bits have been working since they were 
introduced, just the recovery and scrub bits were bad, making it 
effectively a raid0 in reliability terms, lose one and you've lost them 
all.

That's the big picture anyway.  Marc Merlin recently did quite a bit of 
raid5/6 testing and there's a page on the wiki now with what he found.  
Additionally, I saw a scrub support for raid5/6 modes patch on the list 
recently, but while it may be in integration, I believe it's too new to 
have reached release yet.

Wiki, for memory or bookmark: https://btrfs.wiki.kernel.org

Direct user documentation link for bookmark (unwrap as necessary):

https://btrfs.wiki.kernel.org/index.php/
Main_Page#Guides_and_usage_information

The raid5/6 page (which I didn't otherwise see conveniently linked, I dug 
it out of the recent changes list since I knew it was there from on-list 
discussion):

https://btrfs.wiki.kernel.org/index.php/RAID56


@ Marc or Hugo or someone with a wiki account:  Can this be more visibly 
linked from the user-docs contents, added to the user docs category list, 
and probably linked from at least the multiple devices and (for now) the 
gotchas pages?

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: csum failed that was not detected by scrub

2014-05-02 Thread Shilong Wang
Hello,


2014-05-02 17:42 GMT+08:00 Jaap Pieroen j...@pieroen.nl:
 Hi all,

 I completed a full scrub:
 root@nasbak:/home/jpieroen# btrfs scrub status /home/
 scrub status for 7ca5f38e-308f-43ab-b3ea-31b3bcd11a0d
 scrub started at Wed Apr 30 08:30:19 2014 and finished after 144131 seconds
 total bytes scrubbed: 4.76TiB with 0 errors

 Then tried to remove a device:
 root@nasbak:/home/jpieroen# btrfs device delete /dev/sdb /home

 This triggered bug_on, with the following error in dmesg: csum failed
 ino 258 off 1395560448 csum 2284440321 expected csum 319628859

 How can there still be csum failures directly after a scrub?
 If I rerun the scrub it still won't find any errors. I know this,
 because I've had the same issue 3 times in a row. Each time running a
 scrub and still being unable to remove the device.

There is a known RAID5/6 bug, i sent a patch to address this problem.
Could you please double check if your kernel source includes the
following commit:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3b080b2564287be91605bfd1d5ee985696e61d3c

RAID5/6 should detect checksum mismatch, it can not fix errors now.

Thanks,
Wang

 Kind Regards,
 Jaap

 --
 Details:

 root@nasbak:/home/jpieroen#   uname -a
 Linux nasbak 3.14.1-031401-generic #201404141220 SMP Mon Apr 14
 16:21:48 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

 root@nasbak:/home/jpieroen#   btrfs --version
 Btrfs v3.14.1

 root@nasbak:/home/jpieroen#   btrfs fi df /home
 Data, RAID5: total=4.57TiB, used=4.55TiB
 System, RAID1: total=32.00MiB, used=352.00KiB
 Metadata, RAID1: total=7.00GiB, used=5.59GiB

 root@nasbak:/home/jpieroen# btrfs fi show
 Label: 'btrfs_storage'  uuid: 7ca5f38e-308f-43ab-b3ea-31b3bcd11a0d
 Total devices 6 FS bytes used 4.56TiB
 devid1 size 1.82TiB used 1.31TiB path /dev/sde
 devid2 size 1.82TiB used 1.31TiB path /dev/sdf
 devid3 size 1.82TiB used 1.31TiB path /dev/sdg
 devid4 size 931.51GiB used 25.00GiB path /dev/sdb
 devid6 size 2.73TiB used 994.03GiB path /dev/sdh
 devid7 size 2.73TiB used 994.03GiB path /dev/sdi

 Btrfs v3.14.1

 jpieroen@nasbak:~$ dmesg
 [227248.656438] BTRFS info (device sdi): relocating block group
 9735225016320 flags 129
 [227261.713860] BTRFS info (device sdi): found 9 extents
 [227264.531019] BTRFS info (device sdi): found 9 extents
 [227265.011826] BTRFS info (device sdi): relocating block group
 76265029632 flags 129
 [227274.052249] BTRFS info (device sdi): csum failed ino 258 off
 1395560448 csum 2284440321 expected csum 319628859
 [227274.052354] BTRFS info (device sdi): csum failed ino 258 off
 1395564544 csum 3646299263 expected csum 319628859
 [227274.052402] BTRFS info (device sdi): csum failed ino 258 off
 1395568640 csum 281259278 expected csum 319628859
 [227274.052449] BTRFS info (device sdi): csum failed ino 258 off
 1395572736 csum 2594807184 expected csum 319628859
 [227274.052492] BTRFS info (device sdi): csum failed ino 258 off
 1395576832 csum 4288971971 expected csum 319628859
 [227274.052537] BTRFS info (device sdi): csum failed ino 258 off
 1395580928 csum 752615894 expected csum 319628859
 [227274.052581] BTRFS info (device sdi): csum failed ino 258 off
 1395585024 csum 3828951500 expected csum 319628859
 [227274.061279] [ cut here ]
 [227274.061354] kernel BUG at /home/apw/COD/linux/fs/btrfs/extent_io.c:2116!
 [227274.061445] invalid opcode:  [#1] SMP
 [227274.061509] Modules linked in: cuse deflate
 [227274.061573] BTRFS info (device sdi): csum failed ino 258 off
 1395560448 csum 2284440321 expected csum 319628859
 [227274.061707]  ctr twofish_generic twofish_x86_64_3way
 twofish_x86_64 twofish_common camellia_generic camellia_x86_64
 serpent_sse2_x86_64 xts serpent_generic lrw gf128mul glue_helper
 blowfish_generic blowfish_x86_64 blowfish_common cast5_generic
 cast_common ablk_helper cryptd des_generic cmac xcbc rmd160
 crypto_null af_key xfrm_algo nfsd auth_rpcgss nfs_acl nfs lockd sunrpc
 fscache dm_crypt ip6t_REJECT ppdev xt_hl ip6t_rt nf_conntrack_ipv6
 nf_defrag_ipv6 ipt_REJECT xt_comment xt_LOG kvm xt_recent microcode
 xt_multiport xt_limit xt_tcpudp psmouse serio_raw xt_addrtype k10temp
 edac_core ipt_MASQUERADE edac_mce_amd iptable_nat nf_nat_ipv4
 sp5100_tco nf_conntrack_ipv4 nf_defrag_ipv4 ftdi_sio i2c_piix4
 usbserial xt_conntrack ip6table_filter ip6_tables joydev
 nf_conntrack_netbios_ns nf_conntrack_broadcast snd_hda_codec_via
 nf_nat_ftp snd_hda_codec_hdmi nf_nat snd_hda_codec_generic
 nf_conntrack_ftp nf_conntrack snd_hda_intel iptable_filter
 ir_lirc_codec(OF) lirc_dev(OF) ip_tables snd_hda_codec
 ir_mce_kbd_decoder(OF) x_tables snd_hwdep ir_sony_decoder(OF)
 rc_tbs_nec(OF) ir_jvc_decoder(OF) snd_pcm ir_rc6_decoder(OF)
 ir_rc5_decoder(OF) saa716x_tbs_dvb(OF) tbs6982fe(POF) tbs6680fe(POF)
 ir_nec_decoder(OF) tbs6923fe(POF) tbs6985se(POF) tbs6928se(POF)
 tbs6982se(POF) tbs6991fe(POF) tbs6618fe(POF) saa716x_core(OF)
 

Re: [PATCH 07/14] btrfs-progs: Print more info about device sizes

2014-05-02 Thread David Sterba
On Wed, Apr 30, 2014 at 07:38:00PM +0200, Goffredo Baroncelli wrote:
 On 04/30/2014 03:37 PM, David Taylor wrote:
  On Wed, 30 Apr 2014, Frank Kingswood wrote:
  On 30/04/14 13:11, David Sterba wrote:
  On Wed, Apr 30, 2014 at 01:39:27PM +0200, Goffredo Baroncelli wrote:
 
  I found a bit unclear the FS occupied terms.
 
  We're running out of terms to describe and distinguish the space that
  the filesystem uses.
 
  'occupied' seemed like a good choice to me, though it may be not obvious
 
  The space that the filesystem uses in total seems to me is called the
  size. It has nothing to do with utilization.
 
  /dev/sda6, ID: 2
  Device size:10.00GiB
  Filesystem size: 5.00GiB
  
  FS size was what I was about to suggest, before I saw your reply.
 
 Pay attention that this value is not the Filesystem size, 
 but to the maximum space the of THE DEVICE the filesystem is allowed to use.

I agree that plain 'Filesystem size' could be misleading, using the same
term that has an established meaning can cause misuderstandings in
bugreports.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] Btrfs-progs: send, bump stream version

2014-05-02 Thread David Sterba
On Tue, Apr 15, 2014 at 05:40:48PM +0100, Filipe David Borba Manana wrote:
 This increases the send stream version from version 1 to version 2, adding
 2 new commands:
 
 1) total data size - used to tell the receiver how much file data the stream
will add or update;
 
 2) fallocate - used to pre-allocate space for files and to punch holes in 
 files.
 
 This is preparation work for subsequent changes that implement the new 
 features
 (computing total data size and use fallocate for better performance).
 
 Signed-off-by: Filipe David Borba Manana fdman...@gmail.com

The changes in the v2/3/4 look good, thanks.  Patches added to next
integratin.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6 v2] Btrfs: add send_stream_version attribute to sysfs

2014-05-02 Thread David Sterba
On Sun, Apr 20, 2014 at 10:40:03PM +0100, Filipe David Borba Manana wrote:
 So that applications can find out what's the highest send stream
 version supported/implemented by the running kernel:
 
 $ cat /sys/fs/btrfs/send/stream_version
 2
 
 Signed-off-by: Filipe David Borba Manana fdman...@gmail.com
 ---
 
 V2: Renamed /sys/fs/btrfs/send_stream_version to 
 /sys/fs/btrfs/send/stream_version,
 as in the future it might be useful to add other sysfs attrbutes related 
 to
 send (other ro information or tunables like internal buffer sizes, etc).

Sounds good, I don't see any issue with the separate directory. Mixing
it with /sys/fs/btrfs/features does not seem suitable for that if you
intend adding more entries.

Reviewed-by: David Sterba dste...@suse.cz
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6 v2] Btrfs: add send_stream_version attribute to sysfs

2014-05-02 Thread Filipe David Manana
On Fri, May 2, 2014 at 4:46 PM, David Sterba dste...@suse.cz wrote:
 On Sun, Apr 20, 2014 at 10:40:03PM +0100, Filipe David Borba Manana wrote:
 So that applications can find out what's the highest send stream
 version supported/implemented by the running kernel:

 $ cat /sys/fs/btrfs/send/stream_version
 2

 Signed-off-by: Filipe David Borba Manana fdman...@gmail.com
 ---

 V2: Renamed /sys/fs/btrfs/send_stream_version to 
 /sys/fs/btrfs/send/stream_version,
 as in the future it might be useful to add other sysfs attrbutes related 
 to
 send (other ro information or tunables like internal buffer sizes, etc).

 Sounds good, I don't see any issue with the separate directory. Mixing
 it with /sys/fs/btrfs/features does not seem suitable for that if you
 intend adding more entries.

Yeah, I only didn't mix it with the features subdir because that
relates to features that are settable, plus there's 2 versions of it,
one global and one per fs (uuid) subdirectory (and it felt odd to me
to add it to one of those subdirs and not the other).

Thanks David


 Reviewed-by: David Sterba dste...@suse.cz



-- 
Filipe David Manana,

Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: do not increment on bio_index one by one

2014-05-02 Thread David Sterba
On Tue, Apr 29, 2014 at 01:07:58PM +0800, Liu Bo wrote:
 'bio_index' is just a index, it's really not necessary to do increment
 one by one.
 
 Signed-off-by: Liu Bo bo.li@oracle.com

Reviewed-by: David Sterba dste...@suse.cz
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re:

2014-05-02 Thread Jaap Pieroen
Duncan 1i5t5.duncan at cox.net writes:

 
 To those that know the details, this tells the story.
 
 Btrfs raid5/6 modes are not yet code-complete, and scrub is one of the 
 incomplete bits.  btrfs scrub doesn't know how to deal with raid5/6 
 properly just yet.
 
 While the operational bits of raid5/6 support are there, parity is 
 calculated and written, scrub, and recovery from a lost device, are not 
 yet code complete.  Thus, it's effectively a slower, lower capacity raid0 
 without scrub support at this point, except that when the code is 
 complete, you'll get an automatic free upgrade to full raid5 or raid6, 
 because the operational bits have been working since they were 
 introduced, just the recovery and scrub bits were bad, making it 
 effectively a raid0 in reliability terms, lose one and you've lost them 
 all.
 
 That's the big picture anyway.  Marc Merlin recently did quite a bit of 
 raid5/6 testing and there's a page on the wiki now with what he found.  
 Additionally, I saw a scrub support for raid5/6 modes patch on the list 
 recently, but while it may be in integration, I believe it's too new to 
 have reached release yet.
 
 Wiki, for memory or bookmark: https://btrfs.wiki.kernel.org
 
 Direct user documentation link for bookmark (unwrap as necessary):
 
 https://btrfs.wiki.kernel.org/index.php/
 Main_Page#Guides_and_usage_information
 
 The raid5/6 page (which I didn't otherwise see conveniently linked, I dug 
 it out of the recent changes list since I knew it was there from on-list 
 discussion):
 
 https://btrfs.wiki.kernel.org/index.php/RAID56
 
  at  Marc or Hugo or someone with a wiki account:  Can this be more visibly 
 linked from the user-docs contents, added to the user docs category list, 
 and probably linked from at least the multiple devices and (for now) the 
 gotchas pages?
 

So raid5 is much more useless than I assumed. I read Marc's blog and
figured that btrfs was ready enough.

I' really in trouble now. I tried to get rid of raid5 by doing a convert
balance to raid1. But of course this triggered the same issue. And now
I have a dead system because the first thing btrfs does after mounting
is continue the balance which will crash the system and send me into
a vicious loop.

- How can I stop btrfs from continuing balancing?
- How can I salvage this situation and convert to raid1?

Unfortunately I have little spare drives left. Not enough to contain
4.7TiB of data.. :(




--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: csum failed that was not detected by scrub

2014-05-02 Thread Jaap Pieroen
Shilong Wang wangshilong1991 at gmail.com writes:

 
 Hello,
 
 There is a known RAID5/6 bug, i sent a patch to address this problem.
 Could you please double check if your kernel source includes the
 following commit:
 
 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?
id=3b080b2564287be91605bfd1d5ee985696e61d3c
 
 RAID5/6 should detect checksum mismatch, it can not fix errors now.
 
 Thanks,
 Wang

Your patch seems to be in 3.15rc1:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.15-rc1-trusty/CHANGES

I tried rc3 but that made my system crash on boot.. I'm having bad luck

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-05-02 Thread Chris Murphy

On May 2, 2014, at 2:23 AM, Duncan 1i5t5.dun...@cox.net wrote:
 
 Something tells me btrfs replace (not device replace, simply replace) 
 should be moved to btrfs device replace…

The syntax for btrfs device is different though; replace is like balance: 
btrfs balance start and btrfs replace start. And you can also get a status on 
it. We don't (yet) have options to stop, start, resume, which could maybe come 
in handy for long rebuilds and a reboot is required (?) although maybe that 
just gets handled automatically: set it to pause, then unmount, then reboot, 
then mount and resume.

 Well, I'd say two copies if it's only two devices in the raid1... would 
 be true raid1.  But if it's say four devices in the raid1, as is 
 certainly possible with btrfs raid1, that if it's not mirrored 4-way 
 across all devices, it's not true raid1, but rather some sort of hybrid 
 raid,  raid10 (or raid01) if the devices are so arranged, raid1+linear if 
 arranged that way, or some form that doesn't nicely fall into a well 
 defined raid level categorization.

Well, md raid1 is always n-way. So if you use -n 3 and specify three devices, 
you'll get 3-way mirroring (3 mirrors). But I don't know any hardware raid that 
works this way. They all seem to be raid 1 is strictly two devices. At 4 
devices it's raid10, and only in pairs.

Btrfs raid1 with 3+ devices is unique as far as I can tell. It is something 
like raid1 (2 copies) + linear/concat. But that allocation is round robin. I 
don't read code but based on how a 3 disk raid1 volume grows VDI files as it's 
filled it looks like 1GB chunks are copied like this

Disk1   Disk2   Disk3
134 124 235
679 578 689

So 1 through 9 each represent a 1GB chunk. Disk 1 and 2 each have a chunk 1; 
disk 2 and 3 each have a chunk 2, and so on. Total of 9GB of data taking up 
18GB of space, 6GB on each drive. You can't do this with any other raid1 as far 
as I know. You do definitely run out of space on one disk first though because 
of uneven metadata to data chunk allocation.

Anyway I think we're off the rails with raid1 nomenclature as soon as we have 3 
devices. It's probably better to call it replication, with an assumed default 
of 2 replicates unless otherwise specified.

There's definitely a benefit to a 3 device volume with 2 replicates, efficiency 
wise. As soon as we go to four disks 2 replicates it makes more sense to do 
raid10, although I haven't tested odd device raid10 setups so I'm not sure what 
happens.


Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to boot

2014-05-02 Thread Chris Murphy

On May 2, 2014, at 4:00 AM, George Pochiscan george.pochis...@sphs.ro wrote:

 Hello,
 
 I have a problem with a server with Fedora 20 and BTRFS. This server had 
 frequent hard restarts before the filesystem got corrupt and we are unable to 
 boot it.
 
 We have a HP Proliant server with 4 disks @1TB each and Software RAID 5.
 It had Debian installed (i don't know the version) and right now i'm using 
 fedora 20 live to try to rescue the  system.

Fedora 20 Live has kernel 3.11.10 and btrfs-progs 
0.20.rc1.20131114git9f0c53f-1.fc20. So the general rule of thumb without 
knowing exactly what the problem and solution is, is to try a much newer kernel 
and btrfs-progs, like a Fedora Rawhide live media. These are built daily, but 
don't always succeed so you can go here to find the latest of everything:

https://apps.fedoraproject.org/releng-dash/

Find Fedora Live Desktop or Live KDE and click on details. Click the green link 
under descendants   livecd. And then under Output listing you'll see an ISO you 
can download, the one there right now is 
Fedora-Live-Desktop-x86_64-rawhide-20140502.iso - but of course this changes 
daily.

You might want to boot with kernel parameter slub_debug=- (that's a minus 
symbol) because all but Monday built Rawhide kernels have a bunch of kernel 
debug options enabled which makes it quite slow.


 
 When we try btrfsck /dev/md127 i have a lot of checksum errors, and the 
 output is: 
 
 Checking filesystem on /dev/md127
 UUID: e068faf0-2c16-4566-9093-e6d1e21a5e3c
 checking extents
 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
 Csum didn't match
 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
 Csum didn't match
 -
 
 extent buffer leak: start 1006686208 len 4096
 found 32039247396 bytes used err is -22
 total csum bytes: 41608612
 total tree bytes: 388857856
 total fs tree bytes: 310124544
 total extent tree bytes: 22016000
 btree space waste bytes: 126431234
 file data blocks allocated: 47227326464
 referenced 42595635200
 Btrfs v3.12


I suggest a recent Rawhide build. And I suggest just trying to mount the file 
system normally first, and post anything that appears in dmesg. And if the 
mount fails, then try mount option -o recovery, and also post any dmesg 
messages from that too, and note whether or not it mounts. Finally if that 
doesn't work either then see if -o ro,recovery works and what kernel messages 
you get.


 
 
 
 When i attempt to repair i have the following error:
 -
 Backref 1005817856 parent 5 root 5 not found in extent tree
 backpointer mismatch on [1005817856 4096]
 owner ref check failed [1006686208 4096]
 repaired damaged extent references
 Failed to find [1000525824, 168, 4096]
 btrfs unable to find ref byte nr 1000525824 parent 0 root 1  owner 1 offset 0
 btrfsck: extent-tree.c:1752: write_one_cache_group: Assertion `!(ret)' failed.
 Aborted
 

You really shouldn't use --repair right off the bat, it's not a recommended 
early step, you should try normal mounting with newer kernels first, then 
recovery mount options first. Sometimes the repair option makes things worse. 
I'm not sure what its safety status is as of v3.14.

https://btrfs.wiki.kernel.org/index.php/Problem_FAQ

Fedora includes btrfs-zero-log already so depending on the kernel messages you 
might try that before a btrfsck --repair.



Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-05-02 Thread Hugo Mills
On Fri, May 02, 2014 at 01:21:50PM -0600, Chris Murphy wrote:
 
 On May 2, 2014, at 2:23 AM, Duncan 1i5t5.dun...@cox.net wrote:
  
  Something tells me btrfs replace (not device replace, simply replace) 
  should be moved to btrfs device replace…
 
 The syntax for btrfs device is different though; replace is like balance: 
 btrfs balance start and btrfs replace start. And you can also get a status on 
 it. We don't (yet) have options to stop, start, resume, which could maybe 
 come in handy for long rebuilds and a reboot is required (?) although maybe 
 that just gets handled automatically: set it to pause, then unmount, then 
 reboot, then mount and resume.
 
  Well, I'd say two copies if it's only two devices in the raid1... would 
  be true raid1.  But if it's say four devices in the raid1, as is 
  certainly possible with btrfs raid1, that if it's not mirrored 4-way 
  across all devices, it's not true raid1, but rather some sort of hybrid 
  raid,  raid10 (or raid01) if the devices are so arranged, raid1+linear if 
  arranged that way, or some form that doesn't nicely fall into a well 
  defined raid level categorization.
 
 Well, md raid1 is always n-way. So if you use -n 3 and specify three devices, 
 you'll get 3-way mirroring (3 mirrors). But I don't know any hardware raid 
 that works this way. They all seem to be raid 1 is strictly two devices. At 4 
 devices it's raid10, and only in pairs.
 
 Btrfs raid1 with 3+ devices is unique as far as I can tell. It is something 
 like raid1 (2 copies) + linear/concat. But that allocation is round robin. I 
 don't read code but based on how a 3 disk raid1 volume grows VDI files as 
 it's filled it looks like 1GB chunks are copied like this
 
 Disk1 Disk2   Disk3
 134   124 235
 679   578 689
 
 So 1 through 9 each represent a 1GB chunk. Disk 1 and 2 each have a chunk 1; 
 disk 2 and 3 each have a chunk 2, and so on. Total of 9GB of data taking up 
 18GB of space, 6GB on each drive. You can't do this with any other raid1 as 
 far as I know. You do definitely run out of space on one disk first though 
 because of uneven metadata to data chunk allocation.

   The algorithm is that when the chunk allocator is asked for a block
group (in pairs of chunks for RAID-1), it picks the number of chunks
it needs, from different devices, in order of the device with the most
free space. So, with disks of size 8, 4, 4, you get:

Disk 1: 12345678
Disk 2: 1357
Disk 3: 2468

and with 8, 8, 4, you get:

Disk 1: 1234568A
Disk 2: 1234579A
Disk 3: 6789

   Hugo.

 Anyway I think we're off the rails with raid1 nomenclature as soon as we have 
 3 devices. It's probably better to call it replication, with an assumed 
 default of 2 replicates unless otherwise specified.
 
 There's definitely a benefit to a 3 device volume with 2 replicates, 
 efficiency wise. As soon as we go to four disks 2 replicates it makes more 
 sense to do raid10, although I haven't tested odd device raid10 setups so I'm 
 not sure what happens.
 
 
 Chris Murphy
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Prisoner unknown:  Return to Zenda. ---   


signature.asc
Description: Digital signature


Re: Help with space

2014-05-02 Thread Chris Murphy

On May 2, 2014, at 3:08 PM, Hugo Mills h...@carfax.org.uk wrote:

 On Fri, May 02, 2014 at 01:21:50PM -0600, Chris Murphy wrote:
 
 On May 2, 2014, at 2:23 AM, Duncan 1i5t5.dun...@cox.net wrote:
 
 Something tells me btrfs replace (not device replace, simply replace) 
 should be moved to btrfs device replace…
 
 The syntax for btrfs device is different though; replace is like balance: 
 btrfs balance start and btrfs replace start. And you can also get a status 
 on it. We don't (yet) have options to stop, start, resume, which could maybe 
 come in handy for long rebuilds and a reboot is required (?) although maybe 
 that just gets handled automatically: set it to pause, then unmount, then 
 reboot, then mount and resume.
 
 Well, I'd say two copies if it's only two devices in the raid1... would 
 be true raid1.  But if it's say four devices in the raid1, as is 
 certainly possible with btrfs raid1, that if it's not mirrored 4-way 
 across all devices, it's not true raid1, but rather some sort of hybrid 
 raid,  raid10 (or raid01) if the devices are so arranged, raid1+linear if 
 arranged that way, or some form that doesn't nicely fall into a well 
 defined raid level categorization.
 
 Well, md raid1 is always n-way. So if you use -n 3 and specify three 
 devices, you'll get 3-way mirroring (3 mirrors). But I don't know any 
 hardware raid that works this way. They all seem to be raid 1 is strictly 
 two devices. At 4 devices it's raid10, and only in pairs.
 
 Btrfs raid1 with 3+ devices is unique as far as I can tell. It is something 
 like raid1 (2 copies) + linear/concat. But that allocation is round robin. I 
 don't read code but based on how a 3 disk raid1 volume grows VDI files as 
 it's filled it looks like 1GB chunks are copied like this
 
 Disk1Disk2   Disk3
 134  124 235
 679  578 689
 
 So 1 through 9 each represent a 1GB chunk. Disk 1 and 2 each have a chunk 1; 
 disk 2 and 3 each have a chunk 2, and so on. Total of 9GB of data taking up 
 18GB of space, 6GB on each drive. You can't do this with any other raid1 as 
 far as I know. You do definitely run out of space on one disk first though 
 because of uneven metadata to data chunk allocation.
 
   The algorithm is that when the chunk allocator is asked for a block
 group (in pairs of chunks for RAID-1), it picks the number of chunks
 it needs, from different devices, in order of the device with the most
 free space. So, with disks of size 8, 4, 4, you get:
 
 Disk 1: 12345678
 Disk 2: 1357
 Disk 3: 2468
 
 and with 8, 8, 4, you get:
 
 Disk 1: 1234568A
 Disk 2: 1234579A
 Disk 3: 6789

Sure in my example I was assuming equal size disks. But it's a good example to 
have uneven disks also, because it exemplifies all the more the flexibility 
btrfs replication has, over alternatives, with odd numbered *and* uneven size 
disks.


Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs raid56 Was: csum failed that was not detected by scrub

2014-05-02 Thread Duncan
Jaap Pieroen posted on Fri, 02 May 2014 17:48:13 + as excerpted:

 Duncan 1i5t5.duncan at cox.net writes:
 
 
 To those that know the details, this tells the story.
 
 Btrfs raid5/6 modes are not yet code-complete, and scrub is one of the
 incomplete bits.  btrfs scrub doesn't know how to deal with raid5/6
 properly just yet.

 The raid5/6 page (which I didn't otherwise see conveniently linked, I
 dug it out of the recent changes list since I knew it was there from
 on-list discussion):
 
 https://btrfs.wiki.kernel.org/index.php/RAID56

 So raid5 is much more useless than I assumed. I read Marc's blog and
 figured that btrfs was ready enough.
 
 I' really in trouble now. I tried to get rid of raid5 by doing a convert
 balance to raid1. But of course this triggered the same issue. And now I
 have a dead system because the first thing btrfs does after mounting is
 continue the balance which will crash the system and send me into a
 vicious loop.
 
 - How can I stop btrfs from continuing balancing?

That one's easy.  See the Documentation/filesystems/btrfs.txt file in the 
kernel tree or the wiki for btrfs mount options, one of which is 
skip_balance, to address this very sort of problem! =:^)

Alternatively, mounting it read-only should prevent further changes 
including the balance, at least allowing you to get the data off the 
filesystem.

 - How can I salvage this situation and convert to raid1?
 
 Unfortunately I have little spare drives left. Not enough to contain
 4.7TiB of data.. :(

[OK, this goes a bit philosophical, but it's something to think about...]

If you've done your research and followed the advice of the warnings when 
you do a mkfs.btrfs or on the wiki, not a problem, since you know that 
btrfs is still under heavy development and that as a result, it's even 
more critical to have current tested backups for anything you value 
anyway.  Simply use those backups.

Which, by definition, means that if you don't have such backups, you 
didn't consider the data all that valuable after all, actions perhaps 
giving the lie to your claims.  And no excuse for not doing the research 
either, since if you really care about your data, you research a 
filesystem you're not familiar with before trusting your data to it.  So 
again, if you didn't know btrfs was experimental and thus didn't have 
those backups, by definition your actions say you didn't really care 
about the data you put on it, no matter what your words might say.

OTOH, there *IS* such a thing as not realizing the value of something 
until you're in the process of losing it... that I do understand.  But of 
course try telling that to, for instance, someone who has just lost a 
loved one that they never actually /told/ them that...  Sometimes it's 
simply too late.  Tho if it's going to happen, at least here I'd much 
rather it happen to some data, than one of my own loved ones...


Anyway, at least for now you should still be able to recover most of the 
data using skip_balance or read-only mounting.  My guess is that if push 
comes to shove you can either prioritize that data and give up a TiB or 
two if it comes to that, or scrimp here and there, putting a few gigs on 
the odd blank DVD you may have lying around or downgrading a few meals to 
Raman-noodle to afford the $100 or so shipped that pricewatch says a new 
3 TB drive costs, these days.  I've been there, and have found that if I 
think I need it bad enough, that $100 has a way of appearing, like I said 
even if I'm noodling it for a few meals to do it.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html