System freeze mounting USB BTRFS filesystem

2017-01-05 Thread Steven Haigh
:  [] ? btrfs_cleanup_fs_roots+0x14c/0x180 [btrfs]
 kernel:  [] open_ctree+0x2045/0x2520 [btrfs]
 kernel:  [] btrfs_mount+0xda2/0xef0 [btrfs]
 kernel:  [] ? find_next_zero_bit+0x1d/0x20
 kernel:  [] ? find_next_bit+0x18/0x20
 kernel:  [] mount_fs+0x38/0x150
 kernel:  [] ? __alloc_percpu+0x15/0x20
 kernel:  [] vfs_kern_mount+0x67/0x100
 kernel:  [] btrfs_mount+0x1a3/0xef0 [btrfs]
 kernel:  [] ? find_next_zero_bit+0x1d/0x20
 kernel:  [] mount_fs+0x38/0x150
 kernel:  [] ? __alloc_percpu+0x15/0x20
 kernel:  [] vfs_kern_mount+0x67/0x100
 kernel:  [] do_mount+0x1dd/0xc50
 kernel:  [] ? __check_object_size+0x105/0x1dc
 kernel:  [] ? memdup_user+0x4f/0x70
 kernel:  [] SyS_mount+0x83/0xd0
 kernel:  [] entry_SYSCALL_64_fastpath+0x1a/0xa4
 kernel: Code: 48 8b 4d 98 4c 89 e2 4c 89 ee ff d0 49 8b 06 48 85 c0 75
da e9 78 ff ff ff 48 98 48 c1 e0 12 48 89 85 78 ff ff ff e9 d6 fe ff ff
<8b> 8f 5c 03 00 00 48 8b 45 98 31 d2 c1 e1 04 48 f7 f1 8d 1c 00
 kernel: RIP  [] flush_space+0x2ed/0x620 [btrfs]
 kernel:  RSP 
 kernel: CR2: 035c
 kernel: ---[ end trace c49186b736aa143c ]---

Has anyone seen this recently or have any ideas on how to fix this problem?

Please CC me as well as the list as I don't believe I'm currently
subscribed to this specific list.

-- 
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897




signature.asc
Description: OpenPGP digital signature


Re: Is stability a joke?

2016-09-14 Thread Steven Haigh

On 2016-09-15 11:07, Nicholas D Steeves wrote:

On Mon, Sep 12, 2016 at 01:31:42PM -0400, Austin S. Hemmelgarn wrote:
In general yes in this case, but performance starts to degrade 
exponentially
beyond a certain point.  The difference between (for example) 10 and 
20
snapshots is not as much as between 1000 and 1010. The problem here is 
that
we don't really have a BCP document that anyone ever reads.  A lot of 
stuff
that may seem obvious to us after years of working with BTRFS isn't 
going to
be to a newcomer, and it's a lot more likely that some random person 
will
get things write if we have a good, central BCP document than if it 
stays as

scattered tribal knowledge.


"Scattered tribal knowledge"...exactly!  :-D


+1 also.

I haven't been following this closely due to other commitments - but I'm 
happy to see the progress on the 'stability matrix' added to the wiki 
page.


It may seem trivial to people who live, eat, and breathe BTRFS, but for 
others, it saves stress, headaches and data loss.


I can't emphasise enough how important getting this part right is until 
some future date where *everything* just works.


--
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: compress=lzo safe to use?

2016-09-11 Thread Steven Haigh

On 2016-09-12 05:48, Martin Steigerwald wrote:

Am Sonntag, 26. Juni 2016, 13:13:04 CEST schrieb Steven Haigh:

On 26/06/16 12:30, Duncan wrote:
> Steven Haigh posted on Sun, 26 Jun 2016 02:39:23 +1000 as excerpted:
>> In every case, it was a flurry of csum error messages, then instant
>> death.
>
> This is very possibly a known bug in btrfs, that occurs even in raid1
> where a later scrub repairs all csum errors.  While in theory btrfs raid1
> should simply pull from the mirrored copy if its first try fails checksum
> (assuming the second one passes, of course), and it seems to do this just
> fine if there's only an occasional csum error, if it gets too many at
> once, it *does* unfortunately crash, despite the second copy being
> available and being just fine as later demonstrated by the scrub fixing
> the bad copy from the good one.
>
> I'm used to dealing with that here any time I have a bad shutdown (and
> I'm running live-git kde, which currently has a bug that triggers a
> system crash if I let it idle and shut off the monitors, so I've been
> getting crash shutdowns and having to deal with this unfortunately often,
> recently).  Fortunately I keep my root, with all system executables, etc,
> mounted read-only by default, so it's not affected and I can /almost/
> boot normally after such a crash.  The problem is /var/log and /home
> (which has some parts of /var that need to be writable symlinked into /
> home/var, so / can stay read-only).  Something in the normal after-crash
> boot triggers enough csum errors there that I often crash again.
>
> So I have to boot to emergency mode and manually mount the filesystems in
> question, so nothing's trying to access them until I run the scrub and
> fix the csum errors.  Scrub itself doesn't trigger the crash, thankfully,
> and once it has repaired all the csum errors due to partial writes on one
> mirror that either were never made or were properly completed on the
> other mirror, I can exit emergency mode and complete the normal boot (to
> the multi-user default target).  As there's no more csum errors then
> because scrub fixed them all, the boot doesn't crash due to too many such
> errors, and I'm back in business.
>
>
> Tho I believe at least the csum bug that affects me may only trigger if
> compression is (or perhaps has been in the past) enabled.  Since I run
> compress=lzo everywhere, that would certainly affect me.  It would also
> explain why the bug has remained around for quite some time as well,
> since presumably the devs don't run with compression on enough for this
> to have become a personal itch they needed to scratch, thus its remaining
> untraced and unfixed.
>
> So if you weren't using the compress option, your bug is probably
> different, but either way, the whole thing about too many csum errors at
> once triggering a system crash sure does sound familiar, here.

Yes, I was running the compress=lzo option as well... Maybe here lays 
a

common problem?


Hmm… I found this from being referred to by reading Debian wiki page on
BTRFS¹.

I use compress=lzo on BTRFS RAID 1 since April 2014 and I never found 
an

issue. Steven, your filesystem wasn´t RAID 1 but RAID 5 or 6?


Yes, I was using RAID6 - and it has had a track record of eating data. 
There's lots of problems with the implementation / correctness of 
RAID5/6 parity - which I'm pretty sure haven't been nailed down yet. The 
recommendation at the moment is just not to use RAID5 or RAID6 modes of 
BTRFS. The last I heard, if you were using RAID5/6 in BTRFS, the 
recommended action was to migrate your data to a different profile or a 
different FS.


I just want to assess whether using compress=lzo might be dangerous to 
use in
my setup. Actually right now I like to keep using it, since I think at 
least
one of the SSDs does not compress. And… well… /home and / where I use 
it are

both quite full already.


I don't believe the compress=lzo option by itself was a problem - but it 
*may* have an impact in the RAID5/6 parity problems? I'd be guessing 
here, but am happy to be corrected.


--
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke?

2016-09-11 Thread Steven Haigh
This. So much this.

After being burned badly by the documentation / wiki etc making RAID5/6
seem stable, I think its a joke how the features of BTRFS are promoted.

A lot that is marked as 'Implemented' or 'Complete' is little more than
a "In theory, it works" - but will eat your data.

Having a simple reference as to the status of what is going on, and what
will eat your data would probably save Tb's of data in the next few
months and lots of reputation for BTRFS...

On 11/09/16 18:55, Waxhead wrote:
> I have been following BTRFS for years and have recently been starting to
> use BTRFS more and more and as always BTRFS' stability is a hot topic.
> Some says that BTRFS is a dead end research project while others claim
> the opposite.
> 
> Taking a quick glance at the wiki does not say much about what is safe
> to use or not and it also points to some who are using BTRFS in production.
> While BTRFS can apparently work well in production it does have some
> caveats, and finding out what features is safe or not can be problematic
> and I especially think that new users of BTRFS can easily be bitten if
> they do not do a lot of research on it first.
> 
> The Debian wiki for BTRFS (which is recent by the way) contains a bunch
> of warnings and recommendations and is for me a bit better than the
> official BTRFS wiki when it comes to how to decide what features to use.
> 
> The Nouveau graphics driver have a nice feature matrix on it's webpage
> and I think that BTRFS perhaps should consider doing something like that
> on it's official wiki as well
> 
> For example something along the lines of  (the statuses are taken
> our of thin air just for demonstration purposes)
> 
> Kernel version 4.7
> +++-+---+---++---++
> 
> | Feature / Redundancy level | Single | Dup | Raid0 | Raid1 | Raid10 |
> Raid5 | Raid 6 |
> +++-+---+---++---++
> 
> | Subvolumes | Ok | Ok  | Ok| Ok| Ok   | Bad
>   | Bad|
> +++-+---+---++---++
> 
> | Snapshots  | Ok | Ok  | Ok| Ok| Ok |
> Bad   | Bad|
> +++-+---+---++---++
> 
> | LZO Compression| Bad(1) | Bad | Bad   | Bad(2)| Bad|
> Bad   | Bad|
> +++-+---+---++---++
> 
> | ZLIB Compression   | Ok | Ok  | Ok| Ok| Ok |
> Bad   | Bad|
> +++-+---+---++---++
> 
> | Autodefrag | Ok | Bad | Bad(3)| Ok| Ok |
> Bad   | Bad|
> +++-+---+---++---++
> 
> 
> (1) Some explanation here...
> (2) Some explanation there
> (3) And some explanation elsewhere...
> 
> ...etc...etc...
> 
> I therefore would like to propose that some sort of feature / stability
> matrix for the latest kernel is added to the wiki preferably somewhere
> where it is easy to find. It would be nice to archive old matrix'es as
> well in case someone runs on a bit older kernel (we who use Debian tend
> to like older kernels). In my opinion it would make things bit easier
> and perhaps a bit less scary too. Remember if you get bitten badly once
> you tend to stay away from from it all just in case, if you on the other
> hand know what bites you can safely pet the fluffy end instead :)


-- 
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/2] btrfs-progs: mkfs: Warn user for minimal RAID5/6 devices setup

2016-09-01 Thread Steven Haigh
Is it worthwhile adding a note that RAID5 / RAID6 may very well eat your
data at this stage?

On 02/09/16 11:41, Qu Wenruo wrote:
> For RAID5, 2 devices setup is just RAID1 with more overhead.
> For RAID6, 3 devices setup is RAID1 with 3 copies, not what most user
> want.
> 
> So warn user at mkfs time for such case, and add explain in man pages.
> 
> Signed-off-by: Qu Wenruo <quwen...@cn.fujitsu.com>
> ---
>  Documentation/mkfs.btrfs.asciidoc | 15 +++
>  utils.c   | 10 --
>  2 files changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/mkfs.btrfs.asciidoc 
> b/Documentation/mkfs.btrfs.asciidoc
> index 98fe694..846c08f 100644
> --- a/Documentation/mkfs.btrfs.asciidoc
> +++ b/Documentation/mkfs.btrfs.asciidoc
> @@ -263,18 +263,25 @@ There are the following block group types available:
>  .2+^.<h| Profile   3+^.^h| Redundancy   .2+^.<h| Min/max devices
>^.^h| Copies   ^.^h| Parity ^.<h| Striping
>  | single  | 1||| 1/any
> -| DUP | 2 / 1 device ||| 1/any ^(see note)^
> +| DUP | 2 / 1 device ||| 1/any ^(see note1)^
>  | RAID0   |  || 1 to N | 2/any
>  | RAID1   | 2||| 2/any
>  | RAID10  | 2|| 1 to N | 4/any
> -| RAID5   | 1| 1  | 2 to N - 1 | 2/any
> -| RAID6   | 1| 2  | 3 to N - 2 | 3/any
> +| RAID5   | 1| 1  | 2 to N - 1 | 2/any ^(see note2)^
> +| RAID6   | 1| 2  | 3 to N - 2 | 3/any ^(see note3)^
>  |=
>  
> -'Note:' DUP may exist on more than 1 device if it starts on a single device 
> and
> +'Note1:' DUP may exist on more than 1 device if it starts on a single device 
> and
>  another one is added. Since version 4.5.1, *mkfs.btrfs* will let you create 
> DUP
>  on multiple devices.
>  
> +'Note2:' It's not recommended to use 2 devices RAID5. In that case,
> +parity stripe will contains the same data of data stripe, making RAID5 
> degraded
> +to RAID1 with more overhead.
> +
> +'Note3:' It's also not recommended to use 3 devices RAID6, unless one wants 
> to
> +get 3 copies RAID1, which btrfs doesn't provide yet.
> +
>  DUP PROFILES ON A SINGLE DEVICE
>  ---
>  
> diff --git a/utils.c b/utils.c
> index 82f3376..1d6879a 100644
> --- a/utils.c
> +++ b/utils.c
> @@ -3314,6 +3314,7 @@ int test_num_disk_vs_raid(u64 metadata_profile, u64 
> data_profile,
>   u64 dev_cnt, int mixed, int ssd)
>  {
>   u64 allowed = 0;
> + u64 profile = metadata_profile | data_profile;
>  
>   switch (dev_cnt) {
>   default:
> @@ -3328,8 +3329,7 @@ int test_num_disk_vs_raid(u64 metadata_profile, u64 
> data_profile,
>   allowed |= BTRFS_BLOCK_GROUP_DUP;
>   }
>  
> - if (dev_cnt > 1 &&
> - ((metadata_profile | data_profile) & BTRFS_BLOCK_GROUP_DUP)) {
> + if (dev_cnt > 1 && profile & BTRFS_BLOCK_GROUP_DUP) {
>   warning("DUP is not recommended on filesystem with multiple 
> devices");
>   }
>   if (metadata_profile & ~allowed) {
> @@ -3349,6 +3349,12 @@ int test_num_disk_vs_raid(u64 metadata_profile, u64 
> data_profile,
>   return 1;
>   }
>  
> + if (dev_cnt == 3 && profile & BTRFS_BLOCK_GROUP_RAID6) {
> + warning("RAID6 is not recommended on filesystem with 3 devices 
> only");
> + }
> +     if (dev_cnt == 2 && profile & BTRFS_BLOCK_GROUP_RAID5) {
> + warning("RAID5 is not recommended on filesystem with 2 devices 
> only");
> + }
>   warning_on(!mixed && (data_profile & BTRFS_BLOCK_GROUP_DUP) && ssd,
>  "DUP may not actually lead to 2 copies on the device, see 
> manual page");
>  
> 

-- 
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897



signature.asc
Description: OpenPGP digital signature


Re: btrfs replace performance with missing drive

2016-07-14 Thread Steven Haigh
Pray that you have no issue with anything in the short term and that you
don't lose power to the system while it is going on.

I did exactly as you are now and ended up with a corrupted filesystem
due to what you are seeing.

DO NOT interrupt it, or you may have big problems with filesystem
integrity afterwards.

See my previous posts to this list for details on what happened to me.

On 14/07/2016 9:18 PM, Sébastien Luttringer wrote:
> Hello,
> 
> I have a performance issue with «btrfs replace» with raid5 and a _missing_
> device. My btrfs rely on 6x4TB HDD and the operating system is an Archlinux.
> 
> In a nutshell, I will need 23 to 46 days to replace on missing disk.
> 
> # btrfs fi sh /home
> Label: 'raptor.home'  uuid: 8739c8b2-110b-44ac-8b4d-285ad06ee446
> Total devices 7 FS bytes used 14.60TiB
> devid0 size 3.64TiB used 2.80TiB path /dev/sdf
> devid3 size 3.64TiB used 2.97TiB path /dev/sdh
> devid5 size 3.64TiB used 2.97TiB path /dev/sdc
> devid6 size 3.64TiB used 2.97TiB path /dev/sdd
> devid7 size 3.64TiB used 2.97TiB path /dev/sde
> devid8 size 3.64TiB used 2.97TiB path /dev/sdg
> *** Some devices missing
> 
> 
> At a disk full speed (100 MB/s) replacing the missing disk (4 TB) should take
> around 8 hours. With the same disk model and same HBA card in another computer
> with a mdadm/raid5, I could verify this duration could be reach.
> 
> I also tested a «btrfs replace» without a missing disk and the speed was not 
> so
> bad. Somewhere around half disk speed (50-60MB/s). Performances are under
> mdadm.
> 
> But, in my case, the drive is pass away, I can't use it as source of the
> replace, so I have a replace speed of 1-2MB/s ! Which mean between 23-46 days
> with bad usage performance and security risk.
> 
> I tried to upgrade the kernel to the latest (4.7-rc6) but it's not better in
> performance. I got some crash during replace with 4.6.0 which vanish with the
> last rc.
> 
> # iostat -md  
> Linux 4.7.0-rc6-seblu
> (raptor.seblu.net)14/07/2016  _x86_64_(4 CPU)
> 
> Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn
> sdc 356,7522,51 0,149132054  58427
> sdd 356,2722,51 0,149131612  57094
> sde 361,5322,52 0,149132207  57245
> sdf 362,78 0,00 1,81  4 735786
> sdg 357,8222,51 0,149131763  58323
> sdh 325,2522,52 0,149132715  58355
> 
> 
> So I have a really poor performance in rebuilding a raid5 mostly when the
> replaced device is missing.
> Is there a parameter to tweak of something I can do to improve the replace ? 
> 
> Regards,
> 

-- 
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897



signature.asc
Description: OpenPGP digital signature


Re: Adventures in btrfs raid5 disk recovery

2016-06-28 Thread Steven Haigh
On 29/06/16 04:01, Chris Murphy wrote:
> Just wiping the slate clean to summarize:
> 
> 
> 1. We have a consistent ~1 in 3 maybe 1 in 2, reproducible corruption
> of *data extent* parity during a scrub with raid5. Goffredo and I have
> both reproduced it. It's a big bug. It might still be useful if
> someone else can reproduce it too.
> 
> Goffredo, can you file a bug at bugzilla.kernel.org and reference your
> bug thread?  I don't know if the key developers know about this, it
> might be worth pinging them on IRC once the bug is filed.
> 
> Unknown if it affects balance, or raid 6. And if it affects raid 6, is
> p or q corrupted, or both? Unknown how this manifests on metadata
> raid5 profile (only tested was data raid5). Presumably if there is
> metadata corruption that's fixed during a scrub, and its parity is
> overwritten with corrupt parity, the next time there's a degraded
> state, the file system would face plant somehow. And we've seen quite
> a few degraded raid5's (and even 6's) face plant in inexplicable ways
> and we just kinda go, shit. Which is what the fs is doing when it
> encounters a pile of csum errors. It treats the csum errors as a
> signal to disregard the fs rather than maybe only being suspicious of
> the fs. Could it turn out that these file systems were recoverable,
> just that Btrfs wasn't tolerating any csum error and wouldn't proceed
> further?

I believe this is the same case for RAID6 based on my experiences. I
actually wondered if the system halts were the result of a TON of csum
errors - not the actual result of those errors. Just about every system
hang when to 100% CPU usage on all cores and the system just stopped was
after a flood of csum errors. If it was only one or two (or I copied
data off via a network connection where the read rate was slower), I
found I had a MUCH lower chance of the system locking up.

In fact, now that I think about it, when I was copying data to an
external USB drive (maxed out at ~30MB/sec), I still got csum errors -
but the system never hung.

Every crash ended with the last line along the lines of "Stopped
recurring error. Your system needs rebooting". I wonder if this error
reporting was altered, that the system wouldn't go down.

Of course I have no way of testing this.


-- 
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897



signature.asc
Description: OpenPGP digital signature


Re: Adventures in btrfs raid5 disk recovery

2016-06-28 Thread Steven Haigh
On 28/06/16 22:25, Austin S. Hemmelgarn wrote:
> On 2016-06-28 08:14, Steven Haigh wrote:
>> On 28/06/16 22:05, Austin S. Hemmelgarn wrote:
>>> On 2016-06-27 17:57, Zygo Blaxell wrote:
>>>> On Mon, Jun 27, 2016 at 10:17:04AM -0600, Chris Murphy wrote:
>>>>> On Mon, Jun 27, 2016 at 5:21 AM, Austin S. Hemmelgarn
>>>>> <ahferro...@gmail.com> wrote:
>>>>>> On 2016-06-25 12:44, Chris Murphy wrote:
>>>>>>> On Fri, Jun 24, 2016 at 12:19 PM, Austin S. Hemmelgarn
>>>>>>> <ahferro...@gmail.com> wrote:
>>>>>>>
>>>>>>> OK but hold on. During scrub, it should read data, compute checksums
>>>>>>> *and* parity, and compare those to what's on-disk - > EXTENT_CSUM in
>>>>>>> the checksum tree, and the parity strip in the chunk tree. And if
>>>>>>> parity is wrong, then it should be replaced.
>>>>>>
>>>>>> Except that's horribly inefficient.  With limited exceptions
>>>>>> involving
>>>>>> highly situational co-processors, computing a checksum of a parity
>>>>>> block is
>>>>>> always going to be faster than computing parity for the stripe.  By
>>>>>> using
>>>>>> that to check parity, we can safely speed up the common case of near
>>>>>> zero
>>>>>> errors during a scrub by a pretty significant factor.
>>>>>
>>>>> OK I'm in favor of that. Although somehow md gets away with this by
>>>>> computing and checking parity for its scrubs, and still manages to
>>>>> keep drives saturated in the process - at least HDDs, I'm not sure how
>>>>> it fares on SSDs.
>>>>
>>>> A modest desktop CPU can compute raid6 parity at 6GB/sec, a less-modest
>>>> one at more than 10GB/sec.  Maybe a bottleneck is within reach of an
>>>> array of SSDs vs. a slow CPU.
>>> OK, great for people who are using modern desktop or server CPU's.  Not
>>> everyone has that luxury, and even on many such CPU's, it's _still_
>>> faster to computer CRC32c checksums.  On top of that, we don't appear to
>>> be using the in-kernel parity-raid libraries (or if we are, I haven't
>>> been able to find where we are calling the functions for it), so we
>>> don't necessarily get assembly optimized or co-processor accelerated
>>> computation of the parity itself.  The other thing that I didn't mention
>>> above though, is that computing parity checksums will always take less
>>> time than computing parity, because you have to process significantly
>>> less data.  On a 4 disk RAID5 array, you're processing roughly 2/3 as
>>> much data to do the parity checksums instead of parity itself, which
>>> means that the parity computation would need to be 200% faster than the
>>> CRC32c computation to break even, and this margin gets bigger and bigger
>>> as you add more disks.
>>>
>>> On small arrays, this obviously won't have much impact.  Once you start
>>> to scale past a few TB though, even a few hundred MB/s faster processing
>>> means a significant decrease in processing time.  Say you have a CPU
>>> which gets about 12.0GB/s for RAID5 parity, and and about 12.25GB/s for
>>> CRC32c (~2% is a conservative ratio assuming you use the CRC32c
>>> instruction and assembly optimized RAID5 parity computations on a modern
>>> x86_64 processor (the ratio on both the mobile Core i5 in my laptop and
>>> the Xeon E3 in my home server is closer to 5%)).  Assuming those
>>> numbers, and that we're already checking checksums on non-parity blocks,
>>> processing 120TB of data in a 4 disk array (which gives 40TB of parity
>>> data, so 160TB total) gives:
>>> For computing the parity to scrub:
>>> 120TB / 12.25GB =  9795.9 seconds for processing CRC32c csums of all the
>>> regular data
>>> 120TB / 12GB= 1 seconds for processing parity of all stripes
>>> = 19795.9 seconds total
>>> ~ 5.4 hours total
>>>
>>> For computing csums of the parity:
>>> 120TB / 12.25GB =  9795.9 seconds for processing CRC32c csums of all the
>>> regular data
>>> 40TB / 12.25GB  =  3265.3 seconds for processing CRC32c csums of all the
>>> parity data
>>> = 13061.2 seconds total
>>> ~ 3.6 hours total
>>>
>>> The checksum based computation is approxi

Re: Adventures in btrfs raid5 disk recovery

2016-06-28 Thread Steven Haigh
On 28/06/16 22:05, Austin S. Hemmelgarn wrote:
> On 2016-06-27 17:57, Zygo Blaxell wrote:
>> On Mon, Jun 27, 2016 at 10:17:04AM -0600, Chris Murphy wrote:
>>> On Mon, Jun 27, 2016 at 5:21 AM, Austin S. Hemmelgarn
>>> <ahferro...@gmail.com> wrote:
>>>> On 2016-06-25 12:44, Chris Murphy wrote:
>>>>> On Fri, Jun 24, 2016 at 12:19 PM, Austin S. Hemmelgarn
>>>>> <ahferro...@gmail.com> wrote:
>>>>>
>>>>> OK but hold on. During scrub, it should read data, compute checksums
>>>>> *and* parity, and compare those to what's on-disk - > EXTENT_CSUM in
>>>>> the checksum tree, and the parity strip in the chunk tree. And if
>>>>> parity is wrong, then it should be replaced.
>>>>
>>>> Except that's horribly inefficient.  With limited exceptions involving
>>>> highly situational co-processors, computing a checksum of a parity
>>>> block is
>>>> always going to be faster than computing parity for the stripe.  By
>>>> using
>>>> that to check parity, we can safely speed up the common case of near
>>>> zero
>>>> errors during a scrub by a pretty significant factor.
>>>
>>> OK I'm in favor of that. Although somehow md gets away with this by
>>> computing and checking parity for its scrubs, and still manages to
>>> keep drives saturated in the process - at least HDDs, I'm not sure how
>>> it fares on SSDs.
>>
>> A modest desktop CPU can compute raid6 parity at 6GB/sec, a less-modest
>> one at more than 10GB/sec.  Maybe a bottleneck is within reach of an
>> array of SSDs vs. a slow CPU.
> OK, great for people who are using modern desktop or server CPU's.  Not
> everyone has that luxury, and even on many such CPU's, it's _still_
> faster to computer CRC32c checksums.  On top of that, we don't appear to
> be using the in-kernel parity-raid libraries (or if we are, I haven't
> been able to find where we are calling the functions for it), so we
> don't necessarily get assembly optimized or co-processor accelerated
> computation of the parity itself.  The other thing that I didn't mention
> above though, is that computing parity checksums will always take less
> time than computing parity, because you have to process significantly
> less data.  On a 4 disk RAID5 array, you're processing roughly 2/3 as
> much data to do the parity checksums instead of parity itself, which
> means that the parity computation would need to be 200% faster than the
> CRC32c computation to break even, and this margin gets bigger and bigger
> as you add more disks.
> 
> On small arrays, this obviously won't have much impact.  Once you start
> to scale past a few TB though, even a few hundred MB/s faster processing
> means a significant decrease in processing time.  Say you have a CPU
> which gets about 12.0GB/s for RAID5 parity, and and about 12.25GB/s for
> CRC32c (~2% is a conservative ratio assuming you use the CRC32c
> instruction and assembly optimized RAID5 parity computations on a modern
> x86_64 processor (the ratio on both the mobile Core i5 in my laptop and
> the Xeon E3 in my home server is closer to 5%)).  Assuming those
> numbers, and that we're already checking checksums on non-parity blocks,
> processing 120TB of data in a 4 disk array (which gives 40TB of parity
> data, so 160TB total) gives:
> For computing the parity to scrub:
> 120TB / 12.25GB =  9795.9 seconds for processing CRC32c csums of all the
> regular data
> 120TB / 12GB= 1 seconds for processing parity of all stripes
> = 19795.9 seconds total
> ~ 5.4 hours total
> 
> For computing csums of the parity:
> 120TB / 12.25GB =  9795.9 seconds for processing CRC32c csums of all the
> regular data
> 40TB / 12.25GB  =  3265.3 seconds for processing CRC32c csums of all the
> parity data
> = 13061.2 seconds total
> ~ 3.6 hours total
> 
> The checksum based computation is approximately 34% faster than the
> parity computation.  Much of this of course is that you have to process
> the regular data twice for the parity computation method (once for
> csums, once for parity).  You could probably do one pass computing both
> values, but that would need to be done carefully; and, without
> significant optimization, would likely not get you much benefit other
> than cutting the number of loads in half.

And it all means jack shit because you don't get the data to disk that
quick. Who cares if its 500% faster - if it still saturates the
throughput of the actual drives, what difference does it make?

I'm all for actual solutions, but the nirvana fallacy seems to apply here...

-- 
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897



signature.asc
Description: OpenPGP digital signature


Re: Strange behavior when replacing device on BTRFS RAID 5 array.

2016-06-27 Thread Steven Haigh
On 28/06/16 03:46, Chris Murphy wrote:
> On Mon, Jun 27, 2016 at 11:29 AM, Chris Murphy <li...@colorremedies.com> 
> wrote:
> 
>>
>> Next is to decide to what degree you want to salvage this volume and
>> keep using Btrfs raid56 despite the risks
> 
> Forgot to complete this thought. So if you get a backup, and decide
> you want to fix it, I would see if you can cancel the replace using
> "btrfs replace cancel " and confirm that it stops. And now is the
> risky part, which is whether to try "btrfs add" and then "btrfs
> remove" or remove the bad drive, reboot, and see if it'll mount with
> -o degraded, and then use add and remove (in which case you'll use
> 'remove missing').
> 
> The first you risk Btrfs still using the flaky bad drive.
> 
> The second you risk whether a degraded mount will work, and whether
> any other drive in the array has a problem while degraded (like an
> unrecovery read error from a single sector).

This is the exact set of circumstances that caused my corrupt array. I
was using RAID6 - yet it still corrupted large portions of things. In
theory, due to having double parity, it should still have survived even
if a disk did go bad - but there we are.

I first started a replace - noted how slow it was going - cancelled the
replace, then did an add / delete - the system crashed and it was all over.

Just as another data point, I've been flogging the guts out of the array
with mdadm RAID6 doing a reshape of that - and no read errors, system
crashes or other problems in over 48 hours.

-- 
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897



signature.asc
Description: OpenPGP digital signature


Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-06-26 Thread Steven Haigh

On 2016-06-27 08:38, Hugo Mills wrote:

On Sun, Jun 26, 2016 at 03:33:08PM -0700, ronnie sahlberg wrote:

On Sat, Jun 25, 2016 at 7:53 PM, Duncan <1i5t5.dun...@cox.net> wrote:
> Could this explain why people have been reporting so many raid56 mode
> cases of btrfs replacing a first drive appearing to succeed just fine,
> but then they go to btrfs replace a second drive, and the array crashes
> as if the first replace didn't work correctly after all, resulting in two
> bad devices once the second replace gets under way, of course bringing
> down the array?
>
> If so, then it looks like we have our answer as to what has been going
> wrong that has been so hard to properly trace and thus to bugfix.
>
> Combine that with the raid4 dedicated parity device behavior you're
> seeing if the writes are all exactly 128 MB, with that possibly
> explaining the super-slow replaces, and this thread may have just given
> us answers to both of those until-now-untraceable issues.
>
> Regardless, what's /very/ clear by now is that raid56 mode as it
> currently exists is more or less fatally flawed, and a full scrap and
> rewrite to an entirely different raid56 mode on-disk format may be
> necessary to fix it.
>
> And what's even clearer is that people /really/ shouldn't be using raid56
> mode for anything but testing with throw-away data, at this point.
> Anything else is simply irresponsible.
>
> Does that mean we need to put a "raid56 mode may eat your babies" level
> warning in the manpage and require a --force to either mkfs.btrfs or
> balance to raid56 mode?  Because that's about where I am on it.

Agree. At this point letting ordinary users create raid56 filesystems
is counterproductive.


I would suggest:

1, a much more strongly worded warning in the wiki. Make sure there
are no misunderstandings
that they really should not use raid56 right now for new filesystems.


   I beefed up the warnings in several places in the wiki a couple of
days ago.


Not to sound rude - but I don't think these go anywhere near far enough. 
It needs to be completely obvious that its a good chance you'll lose 
everything. IMHO that's the only way that will stop BTRFS from getting 
the 'data eater' reputation. It can be revisited and reworded when the 
implementation is more tested and stable.


--
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-06-26 Thread Steven Haigh

On 2016-06-27 08:33, ronnie sahlberg wrote:

On Sat, Jun 25, 2016 at 7:53 PM, Duncan <1i5t5.dun...@cox.net> wrote:

Chris Murphy posted on Sat, 25 Jun 2016 11:25:05 -0600 as excerpted:

Wow. So it sees the data strip corruption, uses good parity on disk 
to

fix it, writes the fix to disk, recomputes parity for some reason but
does it wrongly, and then overwrites good parity with bad parity?
That's fucked. So in other words, if there are any errors fixed up
during a scrub, you should do a 2nd scrub. The first scrub should 
make

sure data is correct, and the 2nd scrub should make sure the bug is
papered over by computing correct parity and replacing the bad 
parity.


I wonder if the same problem happens with balance or if this is just 
a

bug in scrub code?


Could this explain why people have been reporting so many raid56 mode
cases of btrfs replacing a first drive appearing to succeed just fine,
but then they go to btrfs replace a second drive, and the array 
crashes
as if the first replace didn't work correctly after all, resulting in 
two

bad devices once the second replace gets under way, of course bringing
down the array?

If so, then it looks like we have our answer as to what has been going
wrong that has been so hard to properly trace and thus to bugfix.

Combine that with the raid4 dedicated parity device behavior you're
seeing if the writes are all exactly 128 MB, with that possibly
explaining the super-slow replaces, and this thread may have just 
given

us answers to both of those until-now-untraceable issues.

Regardless, what's /very/ clear by now is that raid56 mode as it
currently exists is more or less fatally flawed, and a full scrap and
rewrite to an entirely different raid56 mode on-disk format may be
necessary to fix it.

And what's even clearer is that people /really/ shouldn't be using 
raid56

mode for anything but testing with throw-away data, at this point.
Anything else is simply irresponsible.

Does that mean we need to put a "raid56 mode may eat your babies" 
level

warning in the manpage and require a --force to either mkfs.btrfs or
balance to raid56 mode?  Because that's about where I am on it.


Agree. At this point letting ordinary users create raid56 filesystems
is counterproductive.


+1


I would suggest:

1, a much more strongly worded warning in the wiki. Make sure there
are no misunderstandings
that they really should not use raid56 right now for new filesystems.


I voiced my concern on #btrfs about this - it really should show that 
this may eat your data and is properly experimental. At the moment, it 
looks as if the features are implemented and working as expected. In my 
case with nothing out of the ordinary - I've now got ~3.8Tb free disk 
space. Certainly not ready for *ANY* kind of public use.



2, Instead of a --force flag. (Users tend to ignore ---force and
warnings in documentation.)
Instead ifdef out the options to create raid56 in mkfs.btrfs.
Developers who want to test can just remove the ifdef and recompile
the tools anyway.
But if end-users have to recompile userspace, that really forces the
point that "you
really should not use this right now".


I think this is a somewhat good idea - however it should be a warning 
along the lines of:
"BTRFS RAID56 is VERY experimental and is known to corrupt data in 
certain cases. Use at your own risk!


Continue? (y/N):"


3, reach out to the documentation and fora for the major distros and
make sure they update their
documentation accordingly.
I think a lot of end-users, if they try to research something, are
more likely to go to  fora and wiki
than search out an upstream fora.


Another good idea.

I'd also recommend updates to the ArchLinux wiki - as for some reason I 
always seem to end up there when searching for a certain topic...


--
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Trying to rescue my data :(

2016-06-25 Thread Steven Haigh
On 26/06/16 12:30, Duncan wrote:
> Steven Haigh posted on Sun, 26 Jun 2016 02:39:23 +1000 as excerpted:
> 
>> In every case, it was a flurry of csum error messages, then instant
>> death.
> 
> This is very possibly a known bug in btrfs, that occurs even in raid1 
> where a later scrub repairs all csum errors.  While in theory btrfs raid1 
> should simply pull from the mirrored copy if its first try fails checksum 
> (assuming the second one passes, of course), and it seems to do this just 
> fine if there's only an occasional csum error, if it gets too many at 
> once, it *does* unfortunately crash, despite the second copy being 
> available and being just fine as later demonstrated by the scrub fixing 
> the bad copy from the good one.
> 
> I'm used to dealing with that here any time I have a bad shutdown (and 
> I'm running live-git kde, which currently has a bug that triggers a 
> system crash if I let it idle and shut off the monitors, so I've been 
> getting crash shutdowns and having to deal with this unfortunately often, 
> recently).  Fortunately I keep my root, with all system executables, etc, 
> mounted read-only by default, so it's not affected and I can /almost/ 
> boot normally after such a crash.  The problem is /var/log and /home 
> (which has some parts of /var that need to be writable symlinked into /
> home/var, so / can stay read-only).  Something in the normal after-crash 
> boot triggers enough csum errors there that I often crash again.
> 
> So I have to boot to emergency mode and manually mount the filesystems in 
> question, so nothing's trying to access them until I run the scrub and 
> fix the csum errors.  Scrub itself doesn't trigger the crash, thankfully, 
> and once it has repaired all the csum errors due to partial writes on one 
> mirror that either were never made or were properly completed on the 
> other mirror, I can exit emergency mode and complete the normal boot (to 
> the multi-user default target).  As there's no more csum errors then 
> because scrub fixed them all, the boot doesn't crash due to too many such 
> errors, and I'm back in business.
> 
> 
> Tho I believe at least the csum bug that affects me may only trigger if 
> compression is (or perhaps has been in the past) enabled.  Since I run 
> compress=lzo everywhere, that would certainly affect me.  It would also 
> explain why the bug has remained around for quite some time as well, 
> since presumably the devs don't run with compression on enough for this 
> to have become a personal itch they needed to scratch, thus its remaining 
> untraced and unfixed.
> 
> So if you weren't using the compress option, your bug is probably 
> different, but either way, the whole thing about too many csum errors at 
> once triggering a system crash sure does sound familiar, here.

Yes, I was running the compress=lzo option as well... Maybe here lays a
common problem?

-- 
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897



signature.asc
Description: OpenPGP digital signature


Re: Trying to rescue my data :(

2016-06-25 Thread Steven Haigh
On 26/06/16 02:25, Chris Murphy wrote:
> On Fri, Jun 24, 2016 at 10:19 PM, Steven Haigh <net...@crc.id.au> wrote:
> 
>>
>> Interesting though that EVERY crash references:
>> kernel BUG at fs/btrfs/extent_io.c:2401!
> 
> Yeah because you're mounted ro, and if this is 4.4.13 unmodified btrfs
> from kernel.org then that's the 3rd line:
> 
> if (head->is_data) {
> ret = btrfs_del_csums(trans, root,
>node->bytenr,
>node->num_bytes);
> 
> So why/what is it cleaning up if it's mounted ro? Anyway, once you're
> no longer making forward progress you could try something newer,
> although it's a coin toss what to try. There are some issues with
> 4.6.0-4.6.2 but there have been a lot of changes in btrfs/extent_io.c
> and btrfs/raid56.c between 4.4.13 that you're using and 4.6.2, so you
> could try that or even build 4.7.rc4 or rc5 by tomorrowish and see how
> that fairs. It sounds like there's just too much (mostly metadata)
> corruption for the degraded state to deal with so it may not matter.
> I'm really skeptical of btrfsck on degraded fs's so I don't think
> that'll help.

Well, I did end up recovering the data that I cared about. I'm not
really keen to ride the BTRFS RAID6 train again any time soon :\

I now have the same as I've had for years - md RAID6 with XFS on top of
it. I'm still copying data back to the array from the various sources I
had to copy it to so I had enough space to do so.

What I find interesting is that the patterns of corruption in the BTRFS
RAID6 is quite clustered. I have ~80Gb of MP3s ripped over the years -
of that, the corruption would take out 3-4 songs in a row, then the next
10 albums or so were intact. What made recovery VERY hard, is that it
got to several situations that just caused a complete system hang.

I tried it on bare metal - just in case it was a Xen thing, but it hard
hung the entire machine then. In every case, it was a flurry of csum
error messages, then instant death. I would have been much happier if
the file had been skipped or returned as unavailable instead of having
the entire machine crash.

I ended up putting the bit of script that I posted earlier in
/etc/rc.local - then just kept doing:
xl destroy myvm && xl create /etc/xen/myvm -c

Wait for the crash, run the above again.

All in all, it took me about 350 boots with an average uptime of about 3
minutes to get the data out that I decided to keep. While not a BTRFS
loss, I did decide with how long it was going to take to not bother
recovering ~3.5Tb of other data that is easily available in other places
on the internet. If I really need the Fedora 24 KDE Spin ISO, or the
CentOS 6 Install DVD, etc etc I can download it again.

-- 
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897



signature.asc
Description: OpenPGP digital signature


Re: Trying to rescue my data :(

2016-06-24 Thread Steven Haigh
On 25/06/2016 3:50 AM, Austin S. Hemmelgarn wrote:
> On 2016-06-24 13:43, Steven Haigh wrote:
>> On 25/06/16 03:40, Austin S. Hemmelgarn wrote:
>>> On 2016-06-24 13:05, Steven Haigh wrote:
>>>> On 25/06/16 02:59, ronnie sahlberg wrote:
>>>> What I have in mind here is that a file seems to get CREATED when I
>>>> copy
>>>> the file that crashes the system in the target directory. I'm thinking
>>>> if I 'cp -an source/ target/' that it will make this somewhat easier
>>>> (it
>>>> won't overwrite the zero byte file).
>>> You may want to try with rsync (rsync -vahogSHAXOP should get just about
>>> everything possible out of the filesystem except for some security
>>> attributes (stuff like SELinux context), and will give you nice
>>> information about progress as well).  It will keep running in the face
>>> of individual read errors, and will only try each file once.  It also
>>> has the advantage of showing you the transfer rate and exactly where in
>>> the directory structure you are, and handles partial copies sanely too
>>> (it's more reliable restarting an rsync transfer than a cp one that got
>>> interrupted part way through).
>>
>> I may try that - I came up with this:
>> #!/bin/bash
>>
>> mount -o ro,nossd,degraded /dev/xvdc /mnt/fileshare/
>>
>> find /mnt/fileshare/data/Photos/ -type f -print0 |
>> while IFS= read -r -d $'\0' line; do
>> echo "Processing $line"
>> DIR=`dirname "$line"`
>> mkdir -p "/mnt/recover/$DIR"
>> if [ ! -e "/mnt/recover/$line" ]; then
>> echo "Copying $line to /mnt/recover/$line"
>> touch "/mnt/recover/$line"
>> sync
>> cp -f "$line" "/mnt/recover/$line"
>> sync
>> fi
>> done
>>
>> umount /mnt/fileshare
>>
>> I'm slowly picking through the data - and it has crashed a few times...
>> It seems that there are some checksum failures that don't crash the
>> entire system - so that's a good thing to know - not sure if that means
>> that it is correcting the data with parity - or something else.
>>
>> I'll see how much data I can extract with this and go from there - as it
>> may be good enough to call it a success.
>>
> AH, if you're having issues with crashes when you hit errors, you may
> want to avoid rsync then, it will try to reread any files that don't
> match in size and mtime, so it would likely just keep crashing on the
> same file over and over again.
> 
> Also, looking at the script you've got, that will probably run faster
> too because it shouldn't need to call stat() on everything like rsync
> does (because of the size and mtime comparison).

Well, as a data point, the data is slowly coming off the RAID6 array.
Some stuff is just dead and crashes the entire host whenever you try to
access it. At the moment, my average uptime is about 2-3 minutes...

I've added my recovery rsync script to /etc/rc.local - and I'm just
starting / destroying the VM every time it crashes.

I'm also rsync'ing the data from that system out to other areas of
storage so I can pull off as much data as possible (I don't have a spare
4.4Tb to use).

I lost a total of 5 photos out of 83Gb worth - which is good. My music
collection doesn't seem to be that lucky - which means lots of time
ripping CDs in the future :P

I haven't tried the applications / ISOs directory yet - but we'll see
how that goes when I get there...

The photos were the main thing I was concerned about, the rest is just
handy.

Interesting though that EVERY crash references:
kernel BUG at fs/btrfs/extent_io.c:2401!


-- 
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897



signature.asc
Description: OpenPGP digital signature


Re: Trying to rescue my data :(

2016-06-24 Thread Steven Haigh
On 25/06/16 03:40, Austin S. Hemmelgarn wrote:
> On 2016-06-24 13:05, Steven Haigh wrote:
>> On 25/06/16 02:59, ronnie sahlberg wrote:
>> What I have in mind here is that a file seems to get CREATED when I copy
>> the file that crashes the system in the target directory. I'm thinking
>> if I 'cp -an source/ target/' that it will make this somewhat easier (it
>> won't overwrite the zero byte file).
> You may want to try with rsync (rsync -vahogSHAXOP should get just about
> everything possible out of the filesystem except for some security
> attributes (stuff like SELinux context), and will give you nice
> information about progress as well).  It will keep running in the face
> of individual read errors, and will only try each file once.  It also
> has the advantage of showing you the transfer rate and exactly where in
> the directory structure you are, and handles partial copies sanely too
> (it's more reliable restarting an rsync transfer than a cp one that got
> interrupted part way through).

I may try that - I came up with this:
#!/bin/bash

mount -o ro,nossd,degraded /dev/xvdc /mnt/fileshare/

find /mnt/fileshare/data/Photos/ -type f -print0 |
while IFS= read -r -d $'\0' line; do
echo "Processing $line"
DIR=`dirname "$line"`
mkdir -p "/mnt/recover/$DIR"
if [ ! -e "/mnt/recover/$line" ]; then
echo "Copying $line to /mnt/recover/$line"
touch "/mnt/recover/$line"
sync
cp -f "$line" "/mnt/recover/$line"
sync
fi
done

umount /mnt/fileshare

I'm slowly picking through the data - and it has crashed a few times...
It seems that there are some checksum failures that don't crash the
entire system - so that's a good thing to know - not sure if that means
that it is correcting the data with parity - or something else.

I'll see how much data I can extract with this and go from there - as it
may be good enough to call it a success.

-- 
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897



signature.asc
Description: OpenPGP digital signature


Re: Trying to rescue my data :(

2016-06-24 Thread Steven Haigh
On 25/06/16 02:59, ronnie sahlberg wrote:
> What I would do in this situation :
> 
> 1, Immediately stop writing to these disks/filesystem. ONLY access it
> in read-only mode until you have salvaged what can be salvaged.

That's ok - I can't even mount it in RW mode :)

> 2, get a new 5T UDB drive (they are cheap) and copy file by file off the 
> array.

I've actually got enough combined space to store stuff places in the
mean time...

> 3, when you hit files that cause panics, make a node of the inode and
> avoid touching that file again.

What I have in mind here is that a file seems to get CREATED when I copy
the file that crashes the system in the target directory. I'm thinking
if I 'cp -an source/ target/' that it will make this somewhat easier (it
won't overwrite the zero byte file).

> Will likely take a lot of work and time since I suspect it is a
> largely manual process. But if the data is important ...

Yeah - there's only about 80Gb on the array that I *really* care about -
the rest is just a bonus if its there - not rage-worthy :P

> Once you have all salvageable data copied to the new drive you can
> decide on how to proceed.
> I.e. if you want to try to repair the filesystem (I have low
> confidence in this for parity raid case) or if you will simply rebuild
> a new fs from scratch.

I honestly think it'll be scorched earth and start again with a new FS.
I'm thinking of going back to mdadm for the RAID (which has worked
perfectly for years) and using maybe a vanilla BTRFS on top of that
block device.

Anything else seems like too much work for too little reward - and lack
of confidence.

> On Fri, Jun 24, 2016 at 9:26 AM, Steven Haigh <net...@crc.id.au> wrote:
>> On 25/06/16 00:52, Steven Haigh wrote:
>>> Ok, so I figured that despite what the BTRFS wiki seems to imply, the
>>> 'multi parity' support just isn't stable enough to be used. So, I'm
>>> trying to revert to what I had before.
>>>
>>> My setup consist of:
>>>   * 2 x 3Tb drives +
>>>   * 3 x 2Tb drives.
>>>
>>> I've got (had?) about 4.9Tb of data.
>>>
>>> My idea was to convert the existing setup using a balance to a 'single'
>>> setup, delete the 3 x 2Tb drives from the BTRFS system, then create a
>>> new mdadm based RAID6 (5 drives degraded to 3), create a new filesystem
>>> on that, then copy the data across.
>>>
>>> So, great - first the balance:
>>> $ btrfs balance start -dconvert=single -mconvert=single -f (yes, I know
>>> it'll reduce the metadata redundancy).
>>>
>>> This promptly was followed by a system crash.
>>>
>>> After a reboot, I can no longer mount the BTRFS in read-write:
>>> [  134.768908] BTRFS info (device xvdd): disk space caching is enabled
>>> [  134.769032] BTRFS: has skinny extents
>>> [  134.769856] BTRFS: failed to read the system array on xvdd
>>> [  134.776055] BTRFS: open_ctree failed
>>> [  143.900055] BTRFS info (device xvdd): allowing degraded mounts
>>> [  143.900152] BTRFS info (device xvdd): not using ssd allocation scheme
>>> [  143.900243] BTRFS info (device xvdd): disk space caching is enabled
>>> [  143.900330] BTRFS: has skinny extents
>>> [  143.901860] BTRFS warning (device xvdd): devid 4 uuid
>>> 61ccce61-9787-453e-b793-1b86f8015ee1 is missing
>>> [  146.539467] BTRFS: missing devices(1) exceeds the limit(0), writeable
>>> mount is not allowed
>>> [  146.552051] BTRFS: open_ctree failed
>>>
>>> I can mount it read only - but then I also get crashes when it seems to
>>> hit a read error:
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064
>>> csum 3245290974 wanted 982056704 mirror 0
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 390821102 wanted 982056704 mirror 1
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 550556475 wanted 982056704 mirror 1
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 1279883714 wanted 982056704 mirror 1
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 2566472073 wanted 982056704 mirror 1
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 1876236691 wanted 982056704 mirror 1
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 3350537857 wanted 982056704 mirror 1
>>> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
>>> 3319706190 wanted 982056704 mirror 1
>>> BTRFS info (device xvdc): csum fai

Re: Trying to rescue my data :(

2016-06-24 Thread Steven Haigh
On 25/06/16 00:52, Steven Haigh wrote:
> Ok, so I figured that despite what the BTRFS wiki seems to imply, the
> 'multi parity' support just isn't stable enough to be used. So, I'm
> trying to revert to what I had before.
> 
> My setup consist of:
>   * 2 x 3Tb drives +
>   * 3 x 2Tb drives.
> 
> I've got (had?) about 4.9Tb of data.
> 
> My idea was to convert the existing setup using a balance to a 'single'
> setup, delete the 3 x 2Tb drives from the BTRFS system, then create a
> new mdadm based RAID6 (5 drives degraded to 3), create a new filesystem
> on that, then copy the data across.
> 
> So, great - first the balance:
> $ btrfs balance start -dconvert=single -mconvert=single -f (yes, I know
> it'll reduce the metadata redundancy).
> 
> This promptly was followed by a system crash.
> 
> After a reboot, I can no longer mount the BTRFS in read-write:
> [  134.768908] BTRFS info (device xvdd): disk space caching is enabled
> [  134.769032] BTRFS: has skinny extents
> [  134.769856] BTRFS: failed to read the system array on xvdd
> [  134.776055] BTRFS: open_ctree failed
> [  143.900055] BTRFS info (device xvdd): allowing degraded mounts
> [  143.900152] BTRFS info (device xvdd): not using ssd allocation scheme
> [  143.900243] BTRFS info (device xvdd): disk space caching is enabled
> [  143.900330] BTRFS: has skinny extents
> [  143.901860] BTRFS warning (device xvdd): devid 4 uuid
> 61ccce61-9787-453e-b793-1b86f8015ee1 is missing
> [  146.539467] BTRFS: missing devices(1) exceeds the limit(0), writeable
> mount is not allowed
> [  146.552051] BTRFS: open_ctree failed
> 
> I can mount it read only - but then I also get crashes when it seems to
> hit a read error:
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064
> csum 3245290974 wanted 982056704 mirror 0
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 390821102 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 550556475 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 1279883714 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 2566472073 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 1876236691 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 3350537857 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 3319706190 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 2377458007 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 2066127208 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 657140479 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 1239359620 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 1598877324 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 1082738394 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 371906697 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 2156787247 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 309399 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 180814340 wanted 982056704 mirror 1
> [ cut here ]
> kernel BUG at fs/btrfs/extent_io.c:2401!
> invalid opcode:  [#1] SMP
> Modules linked in: btrfs x86_pkg_temp_thermal coretemp crct10dif_pclmul
> xor aesni_intel aes_x86_64 lrw gf128mul glue_helper pcspkr raid6_pq
> ablk_helper cryptd nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables
> xen_netfront crc32c_intel xen_gntalloc xen_evtchn ipv6 autofs4
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 2610978113 wanted 982056704 mirror 1
> BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
> 59610051 wanted 982056704 mirror 1
> CPU: 1 PID: 1273 Comm: kworker/u4:4 Not tainted 4.4.13-1.el7xen.x86_64 #1
> Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
> task: 880079ce12c0 ti: 880078788000 task.ti: 880078788000
> RIP: e030:[]  []
> btrfs_check_repairable+0x100/0x110 [btrfs]
> RSP: e02b:88007878bcc8  EFLAGS: 00010297
> RAX: 0001 RBX: 880079db2080 RCX: 0003

Trying to rescue my data :(

2016-06-24 Thread Steven Haigh
 88007b213f30 88007878bd88
 a03a0808 880002d15500 88007878bd18 880079ce12c0
 88007b213e40 001f 8800 88006bb0c048
Call Trace:
 [] end_bio_extent_readpage+0x428/0x560 [btrfs]
 [] bio_endio+0x40/0x60
 [] end_workqueue_fn+0x3c/0x40 [btrfs]
 [] normal_work_helper+0xc1/0x300 [btrfs]
 [] ? finish_task_switch+0x82/0x280
 [] btrfs_endio_helper+0x12/0x20 [btrfs]
 [] process_one_work+0x154/0x400
 [] worker_thread+0x11a/0x460
 [] ? __schedule+0x2bf/0x880
 [] ? rescuer_thread+0x2f0/0x2f0
 [] kthread+0xc9/0xe0
 [] ? kthread_park+0x60/0x60
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_park+0x60/0x60
Code: 00 31 c0 eb d5 8d 48 02 eb d9 31 c0 45 89 e0 48 c7 c6 a0 f8 3f a0
48 c7 c7 00 05 41 a0 e8 c9 f2 fa e0 31 c0 e9 70 ff ff ff 0f 0b <0f> 0b
66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
RIP  [] btrfs_check_repairable+0x100/0x110 [btrfs]
 RSP 
[ cut here ]


So, where to from here? Sadly, I feel there is data loss in my future,
but not sure how to minimise this :\

-- 
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897



signature.asc
Description: OpenPGP digital signature


Re: btrfs-progs 4.6 won't build on CentOS6

2016-06-23 Thread Steven Haigh

On 2016-06-24 13:22, Eric Sandeen wrote:

On 6/23/16 8:49 PM, Steven Haigh wrote:
I've tried to build the new tools for CentOS 6 / Scientific Linux 6 / 
RHEL 6 etc.


During the build process, I see:
cmds-fi-du.c: In function 'du_calc_file_space':
cmds-fi-du.c:330: error: 'FIEMAP_EXTENT_SHARED' undeclared (first use 
in this function)
cmds-fi-du.c:330: error: (Each undeclared identifier is reported only 
once

cmds-fi-du.c:330: error: for each function it appears in.)
make: *** [cmds-fi-du.o] Error 1

I'm guessing this is probably due to a different GCC version used? I'm 
guessing this is a simple fix for someone with knowhow... :)


Fair warning, the btrfs.ko in centos6 is positively ancient, and it
won't be updated.

Just in case that matters to you ... not sure if you just want to
build there, or use actually btrfs under a centos6 kernel


Thanks Eric,

Yeah - I understand the stock kernel is ancient in btrfs terms. I build 
4.4.x for Xen Dom0/U usage at my site: http://xen.crc.id.au and have 
done for several years. This is what I'm targeting with this.


--
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs-progs 4.6 won't build on CentOS6

2016-06-23 Thread Steven Haigh
I've tried to build the new tools for CentOS 6 / Scientific Linux 6 / 
RHEL 6 etc.


During the build process, I see:
cmds-fi-du.c: In function 'du_calc_file_space':
cmds-fi-du.c:330: error: 'FIEMAP_EXTENT_SHARED' undeclared (first use in 
this function)
cmds-fi-du.c:330: error: (Each undeclared identifier is reported only 
once

cmds-fi-du.c:330: error: for each function it appears in.)
make: *** [cmds-fi-du.o] Error 1

I'm guessing this is probably due to a different GCC version used? I'm 
guessing this is a simple fix for someone with knowhow... :)


--
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: system crash on replace

2016-06-23 Thread Steven Haigh

On 2016-06-24 05:13, Scott Talbert wrote:

On Thu, 23 Jun 2016, Steven Haigh wrote:


On 24/06/16 04:35, Austin S. Hemmelgarn wrote:

On 2016-06-23 13:44, Steven Haigh wrote:

Hi all,

Relative newbie to BTRFS, but long time linux user. I pass the full
disks from a Xen Dom0 -> guest DomU and run BTRFS within the DomU.

I've migrated my existing mdadm RAID6 to a BTRFS raid6 layout. I 
have a
drive that threw a few UNC errors during a subsequent balance, so 
I've
pulled that drive, dd /dev/zero to it, then trying to add it back in 
to

the BTRFS system via:
btrfs replace start 4 /dev/xvdf /mnt/fileshare

The command gets accepted and the replace starts - however apart 
from it

being at a glacial pace, it seems to hang the system hard with the
following output:

zeus login: BTRFS info (device xvdc): not using ssd allocation 
scheme

BTRFS info (device xvdc): disk space caching is enabled
BUG: unable to handle kernel paging request at c90040eed000
IP: [] __memcpy+0x12/0x20
PGD 7fbc6067 PUD 7ce9e067 PMD 7b6d6067 PTE 0
Oops: 0002 [#1] SMP
Modules linked in: x86_pkg_temp_thermal coretemp crct10dif_pclmul 
btrfs
aesni_intel aes_x86_64 lrw gf128mul xor glue_helper ablk_helper 
cryptd

pcspkr raid6_pq nfsd auth
_rpcgss nfs_acl lockd grace sunrpc ip_tables xen_netfront 
crc32c_intel

xen_gntalloc xen_evtchn ipv6 autofs4
CPU: 1 PID: 2271 Comm: kworker/u4:4 Not tainted 
4.4.13-1.el7xen.x86_64 #1

Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
task: 88007c5c83c0 ti: 88005b978000 task.ti: 
88005b978000
RIP: e030:[]  [] 
__memcpy+0x12/0x20

RSP: e02b:88005b97bc60  EFLAGS: 00010246
RAX: c90040eecff8 RBX: 1000 RCX: 01ff
RDX:  RSI: 88004e4ec008 RDI: c90040eed000
RBP: 88005b97bd28 R08: c90040eeb000 R09: 607a06e1
R10: 7ff0 R11:  R12: 
R13: 607a26d9 R14: 607a26e1 R15: 880062c613d8
FS:  7fd08feb88c0() GS:88007f50()
knlGS:
CS:  e033 DS:  ES:  CR0: 80050033
CR2: c90040eed000 CR3: 56004000 CR4: 00042660
Stack:
 a05afc77 88005b97bc90 81190a14 1600
 88005ba01900  607a06e1 c90040eeb000
 0002 001b  0021
Call Trace:
 [] ? lzo_decompress_biovec+0x237/0x2b0 [btrfs]
 [] ? vmalloc+0x54/0x60
 [] end_compressed_bio_read+0x1d4/0x2a0 [btrfs]
 [] ? kmem_cache_free+0xcc/0x2c0
 [] bio_endio+0x40/0x60
 [] end_workqueue_fn+0x3c/0x40 [btrfs]
 [] normal_work_helper+0xc1/0x300 [btrfs]
 [] ? finish_task_switch+0x82/0x280
 [] btrfs_endio_helper+0x12/0x20 [btrfs]
 [] process_one_work+0x154/0x400
 [] worker_thread+0x11a/0x460
 [] ? rescuer_thread+0x2f0/0x2f0
 [] kthread+0xc9/0xe0
 [] ? kthread_park+0x60/0x60
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_park+0x60/0x60
Code: 74 0e 48 8b 43 60 48 2b 43 50 88 43 4e 5b 5d c3 e8 b4 fc ff ff 
eb
eb 90 90 66 66 90 66 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07  
48

a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 f3
RIP  [] __memcpy+0x12/0x20
 RSP 
CR2: c90040eed000
---[ end trace 001cc830ec804da7 ]---
BUG: unable to handle kernel paging request at ffd8
IP: [] kthread_data+0x10/0x20
PGD 188d067 PUD 188f067 PMD 0
Oops:  [#2] SMP
Modules linked in: x86_pkg_temp_thermal coretemp crct10dif_pclmul 
btrfs
aesni_intel aes_x86_64 lrw gf128mul xor glue_helper ablk_helper 
cryptd
pcspkr raid6_pq nfsd auth_rpcgss nfs_acl lockd grace sunrpc 
ip_tables

xen_netfront crc32c_intel xen_gntalloc xen_evtchn ipv6 autofs4
CPU: 1 PID: 2271 Comm: kworker/u4:4 Tainted: G  D
4.4.13-1.el7xen.x86_64 #1
task: 88007c5c83c0 ti: 88005b978000 task.ti: 
88005b978000

RIP: e030:[]  []
kthread_data+0x10/0x20
RSP: e02b:88005b97b980  EFLAGS: 00010002
RAX:  RBX: 0001 RCX: 0001
RDX: 81bd5080 RSI: 0001 RDI: 88007c5c83c0
RBP: 88005b97b980 R08: 0001 R09: 13f6a632677e
R10: 001f R11:  R12: 88007c5c83c0
R13:  R14: 000163c0 R15: 0001
FS:  7fd08feb88c0() GS:88007f50()
knlGS:
CS:  e033 DS:  ES:  CR0: 80050033
CR2: 0028 CR3: 56004000 CR4: 00042660
Stack:
 88005b97b998 81094751 88007f5163c0 88005b97b9e0
 8165a34f 88007b678448 88007c5c83c0 88005b97c000
 88007c5c8710  88007c5c83c0 88007cffacc0
Call Trace:
 [] wq_worker_sleeping+0x11/0x90
 [] __schedule+0x3bf/0x880
 [] schedule+0x35/0x80
 [] do_exit+0x65f/0xad0
 [] oops_end+0x9a/0xd0
 [] no_context+0x10d/0x360
 [] __bad_area_nosemaphore+0x109/0x210
 [] bad_area_nosemaphore+0x13/0x20
 [] __do_page_fault+0x80/0x3f0
 [] ? vmap_page_range_noflush+0x284/0x390
 [] do_page_fault+0x22/0x30
 [] page_fault+0x28/0x30
 [] ? __memcpy+0x12/0

Re: system crash on replace

2016-06-23 Thread Steven Haigh
On 24/06/16 04:35, Austin S. Hemmelgarn wrote:
> On 2016-06-23 13:44, Steven Haigh wrote:
>> Hi all,
>>
>> Relative newbie to BTRFS, but long time linux user. I pass the full
>> disks from a Xen Dom0 -> guest DomU and run BTRFS within the DomU.
>>
>> I've migrated my existing mdadm RAID6 to a BTRFS raid6 layout. I have a
>> drive that threw a few UNC errors during a subsequent balance, so I've
>> pulled that drive, dd /dev/zero to it, then trying to add it back in to
>> the BTRFS system via:
>> btrfs replace start 4 /dev/xvdf /mnt/fileshare
>>
>> The command gets accepted and the replace starts - however apart from it
>> being at a glacial pace, it seems to hang the system hard with the
>> following output:
>>
>> zeus login: BTRFS info (device xvdc): not using ssd allocation scheme
>> BTRFS info (device xvdc): disk space caching is enabled
>> BUG: unable to handle kernel paging request at c90040eed000
>> IP: [] __memcpy+0x12/0x20
>> PGD 7fbc6067 PUD 7ce9e067 PMD 7b6d6067 PTE 0
>> Oops: 0002 [#1] SMP
>> Modules linked in: x86_pkg_temp_thermal coretemp crct10dif_pclmul btrfs
>> aesni_intel aes_x86_64 lrw gf128mul xor glue_helper ablk_helper cryptd
>> pcspkr raid6_pq nfsd auth
>> _rpcgss nfs_acl lockd grace sunrpc ip_tables xen_netfront crc32c_intel
>> xen_gntalloc xen_evtchn ipv6 autofs4
>> CPU: 1 PID: 2271 Comm: kworker/u4:4 Not tainted 4.4.13-1.el7xen.x86_64 #1
>> Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
>> task: 88007c5c83c0 ti: 88005b978000 task.ti: 88005b978000
>> RIP: e030:[]  [] __memcpy+0x12/0x20
>> RSP: e02b:88005b97bc60  EFLAGS: 00010246
>> RAX: c90040eecff8 RBX: 1000 RCX: 01ff
>> RDX:  RSI: 88004e4ec008 RDI: c90040eed000
>> RBP: 88005b97bd28 R08: c90040eeb000 R09: 607a06e1
>> R10: 7ff0 R11:  R12: 
>> R13: 607a26d9 R14: 607a26e1 R15: 880062c613d8
>> FS:  7fd08feb88c0() GS:88007f50()
>> knlGS:
>> CS:  e033 DS:  ES:  CR0: 80050033
>> CR2: c90040eed000 CR3: 56004000 CR4: 00042660
>> Stack:
>>  a05afc77 88005b97bc90 81190a14 1600
>>  88005ba01900  607a06e1 c90040eeb000
>>  0002 001b  0021
>> Call Trace:
>>  [] ? lzo_decompress_biovec+0x237/0x2b0 [btrfs]
>>  [] ? vmalloc+0x54/0x60
>>  [] end_compressed_bio_read+0x1d4/0x2a0 [btrfs]
>>  [] ? kmem_cache_free+0xcc/0x2c0
>>  [] bio_endio+0x40/0x60
>>  [] end_workqueue_fn+0x3c/0x40 [btrfs]
>>  [] normal_work_helper+0xc1/0x300 [btrfs]
>>  [] ? finish_task_switch+0x82/0x280
>>  [] btrfs_endio_helper+0x12/0x20 [btrfs]
>>  [] process_one_work+0x154/0x400
>>  [] worker_thread+0x11a/0x460
>>  [] ? rescuer_thread+0x2f0/0x2f0
>>  [] kthread+0xc9/0xe0
>>  [] ? kthread_park+0x60/0x60
>>  [] ret_from_fork+0x3f/0x70
>>  [] ? kthread_park+0x60/0x60
>> Code: 74 0e 48 8b 43 60 48 2b 43 50 88 43 4e 5b 5d c3 e8 b4 fc ff ff eb
>> eb 90 90 66 66 90 66 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07  48
>> a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 f3
>> RIP  [] __memcpy+0x12/0x20
>>  RSP 
>> CR2: c90040eed000
>> ---[ end trace 001cc830ec804da7 ]---
>> BUG: unable to handle kernel paging request at ffd8
>> IP: [] kthread_data+0x10/0x20
>> PGD 188d067 PUD 188f067 PMD 0
>> Oops:  [#2] SMP
>> Modules linked in: x86_pkg_temp_thermal coretemp crct10dif_pclmul btrfs
>> aesni_intel aes_x86_64 lrw gf128mul xor glue_helper ablk_helper cryptd
>> pcspkr raid6_pq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables
>> xen_netfront crc32c_intel xen_gntalloc xen_evtchn ipv6 autofs4
>> CPU: 1 PID: 2271 Comm: kworker/u4:4 Tainted: G  D
>> 4.4.13-1.el7xen.x86_64 #1
>> task: 88007c5c83c0 ti: 88005b978000 task.ti: 88005b978000
>> RIP: e030:[]  []
>> kthread_data+0x10/0x20
>> RSP: e02b:88005b97b980  EFLAGS: 00010002
>> RAX:  RBX: 0001 RCX: 0001
>> RDX: 81bd5080 RSI: 0001 RDI: 88007c5c83c0
>> RBP: 88005b97b980 R08: 0001 R09: 13f6a632677e
>> R10: 001f R11:  R12: 88007c5c83c0
>> R13:  R14: 000163c0 R15: 0001
>> FS:  7fd08feb88c0() GS:88007f50()
>> knlGS:
>> CS:  e033 DS:  ES:  CR

system crash on replace

2016-06-23 Thread Steven Haigh
x3c/0x40 [btrfs]
 [] normal_work_helper+0xc1/0x300 [btrfs]
 [] ? finish_task_switch+0x82/0x280
 [] btrfs_endio_helper+0x12/0x20 [btrfs]
 [] process_one_work+0x154/0x400
 [] worker_thread+0x11a/0x460
 [] ? rescuer_thread+0x2f0/0x2f0
 [] kthread+0xc9/0xe0
 [] ? kthread_park+0x60/0x60
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_park+0x60/0x60
Code: ff ff eb 92 be 3a 02 00 00 48 c7 c7 de 81 7a 81 e8 d6 34 fe ff e9
d5 fe ff ff 90 66 66 66 66 90 55 48 8b 87 00 04 00 00 48 89 e5 <48> 8b
40 d8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
RIP  [] kthread_data+0x10/0x20
 RSP 
CR2: ffd8
---[ end trace 001cc830ec804da8 ]---
Fixing recursive fault but reboot is needed!

Current details:
$ btrfs fi show
Label: 'BTRFS'  uuid: 41cda023-1c96-4174-98ba-f29a3d38cd85
Total devices 5 FS bytes used 4.81TiB
devid1 size 2.73TiB used 1.66TiB path /dev/xvdc
devid2 size 2.73TiB used 1.67TiB path /dev/xvdd
devid3 size 1.82TiB used 1.82TiB path /dev/xvde
devid5 size 1.82TiB used 1.82TiB path /dev/xvdg
*** Some devices missing

$ btrfs fi df /mnt/fileshare
Data, RAID6: total=5.14TiB, used=4.81TiB
System, RAID6: total=160.00MiB, used=608.00KiB
Metadata, RAID6: total=6.19GiB, used=5.19GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

[   12.833431] Btrfs loaded
[   12.834339] BTRFS: device label BTRFS devid 1 transid 46614 /dev/xvdc
[   12.861109] BTRFS: device label BTRFS devid 0 transid 46614 /dev/xvdf
[   12.861357] BTRFS: device label BTRFS devid 3 transid 46614 /dev/xvde
[   12.862859] BTRFS: device label BTRFS devid 2 transid 46614 /dev/xvdd
[   12.863372] BTRFS: device label BTRFS devid 5 transid 46614 /dev/xvdg
[  379.629911] BTRFS info (device xvdg): allowing degraded mounts
[  379.630028] BTRFS info (device xvdg): not using ssd allocation scheme
[  379.630120] BTRFS info (device xvdg): disk space caching is enabled
[  379.630207] BTRFS: has skinny extents
[  379.631024] BTRFS warning (device xvdg): devid 4 uuid
61ccce61-9787-453e-b793-1b86f8015ee1 is missing
[  383.675967] BTRFS info (device xvdg): continuing dev_replace from
 (devid 4) to /dev/xvdf @0%

I've currently cancelled the replace with a btrfs replace cancel
/mnt/fileshare - so at least the system doesn't crash completely in the
meantime - and mounted it with -o degraded until I can see what the deal
is here.

Any suggestions?

-- 
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897



signature.asc
Description: OpenPGP digital signature