Re: btrfs-image gets stuck, using 100%, looping on bad file descriptor

2015-08-20 Thread Timothy Normand Miller
On Thu, Aug 20, 2015 at 7:38 AM, Austin S Hemmelgarn
 wrote:

> Just for reference, I've found that it is usually safer to delete the
> missing device first if possible, then add the new one and re-balance. There
> seem to be some edge-cases in the code for deleting missing devices.
>

The problem is that you can't do that if there's not enough space on
the remaining devices to hold all the data.


-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-image gets stuck, using 100%, looping on bad file descriptor

2015-08-19 Thread Timothy Normand Miller
On Wed, Aug 19, 2015 at 1:22 AM, Qu Wenruo  wrote:
>
>
> Timothy Normand Miller wrote on 2015/08/18 22:55 -0400:
>>
>> On Tue, Aug 18, 2015 at 10:48 PM, Qu Wenruo 
>> wrote:
>>>
>>>
>>>
>>> Timothy Normand Miller wrote on 2015/08/18 22:46 -0400:
>>>>
>>>>
>>>> On Tue, Aug 18, 2015 at 9:32 PM, Qu Wenruo 
>>>> wrote:
>>>>>
>>>>>
>>>>> Hi Timothy,
>>>>>
>>>>> Although I have replied to the bugzilla, IMHO it's more appropriate to
>>>>> discuss it in mail list, as it's not a kernel bug.
>>>>>
>>>>
>>>> All four devices were online.  The "missing" one was a drive that
>>>> died, which was replaced by a new one, but btrfs wouldn't finish the
>>>> deletion of the missing device.
>>>>
>>> By replaced, did you mean "btrfs replace"? Or just change the physical
>>> disk
>>> without using "btrfs replace"?
>>
>>
>> Here's what happened:
>>
>> - A drive started throwing bad sectors.  Somehow this caused metadata
>> on other drives to get messed up.
>
>
> Did that cause any huge damage?

It seems that metadata was damaged on all drives.

>
>> - I took that drive offline and mounted degraded (it's a 4-drive RAID1)
>> - I did a "btrfs add" on a new drive and then a "btrfs delete missing"
>> - The replacement drive failed during the replacement operation, and
>> everything went to crap.
>> - With some help, I got a kernel patch that allowed me to mount the
>> original three drives with TWO missing devices.
>
>
> So the original 3 drives are still OK,
> original bad one is missing, and the newly add one is also missing?
>
> That sounds quite repairable.

Nothing I tried would run to completion.  There were always errors.

>
>> - I added a brand new drive and then did "delete missing" again.  This
>> time, the first "delete missing" was successful, but it didn't fully
>> balance the drives, and there was another missing device, so I had to
>> do a "delete missing" again, and that failed.
>>
>> I wanted to get this back online and restored from a backup, but I was
>> willing to keep it this way if people wanted to probe at, in case we
>> can uncover any btrfs bugs.  So it was suggested to get a metadata
>> image, but that ran into some kind of bug in btrfs-image.
>
> If btrfs-image doesn't work, you can also try btrfs-debug-tree.
> IIRC, debug-tree should be more robust than btrfs-image.
>
> BTW, have you tried btrfsck on it? Does it also cause the infinite loop?
>
> I'll also try to reproduce it and investigate the codes directly.

Well, I had to get things back online, so I've restored from backup.
I do have what limited metadata image I could get from btrfs-image.

>
> Thanks,
> Qu
>
>>
>> Currently, I'm restoring from backup, but I have at least a partial
>> metadata dump.
>>
>>
>



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-image gets stuck, using 100%, looping on bad file descriptor

2015-08-18 Thread Timothy Normand Miller
On Tue, Aug 18, 2015 at 10:48 PM, Qu Wenruo  wrote:
>
>
> Timothy Normand Miller wrote on 2015/08/18 22:46 -0400:
>>
>> On Tue, Aug 18, 2015 at 9:32 PM, Qu Wenruo 
>> wrote:
>>>
>>> Hi Timothy,
>>>
>>> Although I have replied to the bugzilla, IMHO it's more appropriate to
>>> discuss it in mail list, as it's not a kernel bug.
>>>
>>
>> All four devices were online.  The "missing" one was a drive that
>> died, which was replaced by a new one, but btrfs wouldn't finish the
>> deletion of the missing device.
>>
> By replaced, did you mean "btrfs replace"? Or just change the physical disk
> without using "btrfs replace"?

Here's what happened:

- A drive started throwing bad sectors.  Somehow this caused metadata
on other drives to get messed up.
- I took that drive offline and mounted degraded (it's a 4-drive RAID1)
- I did a "btrfs add" on a new drive and then a "btrfs delete missing"
- The replacement drive failed during the replacement operation, and
everything went to crap.
- With some help, I got a kernel patch that allowed me to mount the
original three drives with TWO missing devices.
- I added a brand new drive and then did "delete missing" again.  This
time, the first "delete missing" was successful, but it didn't fully
balance the drives, and there was another missing device, so I had to
do a "delete missing" again, and that failed.

I wanted to get this back online and restored from a backup, but I was
willing to keep it this way if people wanted to probe at, in case we
can uncover any btrfs bugs.  So it was suggested to get a metadata
image, but that ran into some kind of bug in btrfs-image.

Currently, I'm restoring from backup, but I have at least a partial
metadata dump.


-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-image gets stuck, using 100%, looping on bad file descriptor

2015-08-18 Thread Timothy Normand Miller
On Tue, Aug 18, 2015 at 9:32 PM, Qu Wenruo  wrote:
> Hi Timothy,
>
> Although I have replied to the bugzilla, IMHO it's more appropriate to
> discuss it in mail list, as it's not a kernel bug.
>

All four devices were online.  The "missing" one was a drive that
died, which was replaced by a new one, but btrfs wouldn't finish the
deletion of the missing device.

-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, wipe it out and start over or keep debugging?

2015-08-18 Thread Timothy Normand Miller
I was doing it on an unmounted volume anyhow.

On Tue, Aug 18, 2015 at 5:09 PM, Chris Murphy  wrote:
> On Tue, Aug 18, 2015 at 5:21 AM, Austin S Hemmelgarn
>  wrote:
>> On 2015-08-17 14:52, Timothy Normand Miller wrote:
>>>
>>> I'm not sure if I'm doing this wrong.  Here's what I'm seeing:
>>>
>>> # btrfs-image -c9 -t4 -w /mnt/btrfs ~/btrfs_dump.z
>>> Superblock bytenr is larger than device size
>>> Open ctree failed
>>> create failed (No such file or directory)
>>
>>
>> For the source, you need to specify the underlying block device, not the top
>> of the mounted filesystem.  It's trying to read the directory as a block
>> device and getting very confused.  We should probably add some kind of check
>> to btrfs-image to warn about that.
>
> Should it even be possible to use btrfs-image on a mounted volume? If
> it's written to at all, the collected image is going to be
> inconsistent.
>
> --
> Chris Murphy



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: chattr +C on subvolume

2015-08-18 Thread Timothy Normand Miller
Never mind on that last lsattr question.  I needed a "-d" option.  Silly me.  :)

On Tue, Aug 18, 2015 at 1:39 PM, Timothy Normand Miller
 wrote:
> Another weird thing I've noticed.  I did this:
>
> chattr +C /mnt/btrfs/vms
>
> But both of these report nothing:
>
> lsattr /mnt/btrfs/vms
> lsattr /mnt/vms
>
> Shouldn't at least one show the C attribute?
>
>
> On Tue, Aug 18, 2015 at 1:36 PM, Timothy Normand Miller
>  wrote:
>> Maybe this is a dumb question, but there are always corner cases.
>>
>> I have a subvolume where I want to disable CoW for VM disks.  Maybe
>> that's a dumb idea, but that's a recommendation I've seen here and
>> there.  Now, in the docs I've seen, +C applies to a directory.  Does
>> it apply to subvolumes?  And do I apply it to the subvolume within the
>> main volume, or do I apply it to the mount point where I've mounted
>> the subvolume separately?  Are there any cases where the flag applies
>> or not depending on how you access the files?
>>
>> The same subvolume for me is accessible via /mnt/btrfs/vms (via the
>> /mnt/btrfs mount point) and /mnt/vms (where the subvolume is mounted).
>> I applied +C to /mnt/btrfs/vms.  So what I'm trying to find out is if
>> it also applies when files are accessed via /mnt/vms.
>>
>> Thanks.
>>
>>
>> --
>> Timothy Normand Miller, PhD
>> Assistant Professor of Computer Science, Binghamton University
>> http://www.cs.binghamton.edu/~millerti/
>> Open Graphics Project
>
>
>
> --
> Timothy Normand Miller, PhD
> Assistant Professor of Computer Science, Binghamton University
> http://www.cs.binghamton.edu/~millerti/
> Open Graphics Project



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: chattr +C on subvolume

2015-08-18 Thread Timothy Normand Miller
Another weird thing I've noticed.  I did this:

chattr +C /mnt/btrfs/vms

But both of these report nothing:

lsattr /mnt/btrfs/vms
lsattr /mnt/vms

Shouldn't at least one show the C attribute?


On Tue, Aug 18, 2015 at 1:36 PM, Timothy Normand Miller
 wrote:
> Maybe this is a dumb question, but there are always corner cases.
>
> I have a subvolume where I want to disable CoW for VM disks.  Maybe
> that's a dumb idea, but that's a recommendation I've seen here and
> there.  Now, in the docs I've seen, +C applies to a directory.  Does
> it apply to subvolumes?  And do I apply it to the subvolume within the
> main volume, or do I apply it to the mount point where I've mounted
> the subvolume separately?  Are there any cases where the flag applies
> or not depending on how you access the files?
>
> The same subvolume for me is accessible via /mnt/btrfs/vms (via the
> /mnt/btrfs mount point) and /mnt/vms (where the subvolume is mounted).
> I applied +C to /mnt/btrfs/vms.  So what I'm trying to find out is if
> it also applies when files are accessed via /mnt/vms.
>
> Thanks.
>
>
> --
> Timothy Normand Miller, PhD
> Assistant Professor of Computer Science, Binghamton University
> http://www.cs.binghamton.edu/~millerti/
> Open Graphics Project



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


chattr +C on subvolume

2015-08-18 Thread Timothy Normand Miller
Maybe this is a dumb question, but there are always corner cases.

I have a subvolume where I want to disable CoW for VM disks.  Maybe
that's a dumb idea, but that's a recommendation I've seen here and
there.  Now, in the docs I've seen, +C applies to a directory.  Does
it apply to subvolumes?  And do I apply it to the subvolume within the
main volume, or do I apply it to the mount point where I've mounted
the subvolume separately?  Are there any cases where the flag applies
or not depending on how you access the files?

The same subvolume for me is accessible via /mnt/btrfs/vms (via the
/mnt/btrfs mount point) and /mnt/vms (where the subvolume is mounted).
I applied +C to /mnt/btrfs/vms.  So what I'm trying to find out is if
it also applies when files are accessed via /mnt/vms.

Thanks.


-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs-image gets stuck, using 100%, looping on bad file descriptor

2015-08-18 Thread Timothy Normand Miller
I've filed a bug report on this:

https://bugzilla.kernel.org/show_bug.cgi?id=103081

-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, wipe it out and start over or keep debugging?

2015-08-18 Thread Timothy Normand Miller
I ran the following command.  It spent a lot of time creating a
1672450048 byte file.  Then it stopped writing to the file and started
using 100% CPU.  It's currently doing no I/O, and it's been doing that
for a while now.  Is that supposed to happen?

On Tue, Aug 18, 2015 at 9:30 AM, Timothy Normand Miller
 wrote:
> In that case, do I need to do all four block devices separately, or
> will the tool figure it out?
>
> On Tue, Aug 18, 2015 at 7:21 AM, Austin S Hemmelgarn
>  wrote:
>> On 2015-08-17 14:52, Timothy Normand Miller wrote:
>>>
>>> I'm not sure if I'm doing this wrong.  Here's what I'm seeing:
>>>
>>> # btrfs-image -c9 -t4 -w /mnt/btrfs ~/btrfs_dump.z
>>> Superblock bytenr is larger than device size
>>> Open ctree failed
>>> create failed (No such file or directory)
>>
>>
>> For the source, you need to specify the underlying block device, not the top
>> of the mounted filesystem.  It's trying to read the directory as a block
>> device and getting very confused.  We should probably add some kind of check
>> to btrfs-image to warn about that.
>>
>>
>
>
>
> --
> Timothy Normand Miller, PhD
> Assistant Professor of Computer Science, Binghamton University
> http://www.cs.binghamton.edu/~millerti/
> Open Graphics Project



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, wipe it out and start over or keep debugging?

2015-08-18 Thread Timothy Normand Miller
In that case, do I need to do all four block devices separately, or
will the tool figure it out?

On Tue, Aug 18, 2015 at 7:21 AM, Austin S Hemmelgarn
 wrote:
> On 2015-08-17 14:52, Timothy Normand Miller wrote:
>>
>> I'm not sure if I'm doing this wrong.  Here's what I'm seeing:
>>
>> # btrfs-image -c9 -t4 -w /mnt/btrfs ~/btrfs_dump.z
>> Superblock bytenr is larger than device size
>> Open ctree failed
>> create failed (No such file or directory)
>
>
> For the source, you need to specify the underlying block device, not the top
> of the mounted filesystem.  It's trying to read the directory as a block
> device and getting very confused.  We should probably add some kind of check
> to btrfs-image to warn about that.
>
>



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, wipe it out and start over or keep debugging?

2015-08-17 Thread Timothy Normand Miller
I'm not sure if I'm doing this wrong.  Here's what I'm seeing:

# btrfs-image -c9 -t4 -w /mnt/btrfs ~/btrfs_dump.z
Superblock bytenr is larger than device size
Open ctree failed
create failed (No such file or directory)


On Mon, Aug 17, 2015 at 7:43 AM, Austin S Hemmelgarn
 wrote:
> On 2015-08-15 17:46, Timothy Normand Miller wrote:
>>
>> To those of you who have been helping out with my 4-drive RAID1
>> situation, is there anything further we should do to investigate this,
>> in case we can uncover any more bugs, or should I just wipe everything
>> out and restore from backup?
>>
> If you need the system back online, then my suggestion would be to use
> btrfs-image to get metadata images of the disks (there's an option to clear
> out private data if need be), and then restore from backup.  That way, we
> still have the problematic images to work with and examine.
>



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


So, wipe it out and start over or keep debugging?

2015-08-15 Thread Timothy Normand Miller
To those of you who have been helping out with my 4-drive RAID1
situation, is there anything further we should do to investigate this,
in case we can uncover any more bugs, or should I just wipe everything
out and restore from backup?

-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "delete missing" with two missing devices doesn't delete both missing, only does a partial reconstruction

2015-08-15 Thread Timothy Normand Miller
Here's the associated bug report with the full dmesg:

https://bugzilla.kernel.org/show_bug.cgi?id=102941

On Sat, Aug 15, 2015 at 9:13 AM, Timothy Normand Miller
 wrote:
> So I tried deleting the files that I think are the problem, and the
> file system went suddenly read-only, and I got this in dmesg:
>
> A bunch of these first messages:
> [39710.420118]  item 45 key (1668296151040 168 524288) itemoff 1557 itemsize 
> 53
> [39710.420118]  extent refs 1 gen 166914 flags 1
> [39710.420119]  extent data backref root 949 objectid 440675
> offset 2621440 count 1
> [39710.420120]  item 46 key (1668296675328 168 524288) itemoff 1504 itemsize 
> 53
> [39710.420120]  extent refs 1 gen 166914 flags 1
> [39710.420121]  extent data backref root 949 objectid 440675
> offset 3145728 count 1
> [39710.420121]  item 47 key (1668297199616 168 524288) itemoff 1451 itemsize 
> 53
> [39710.420122]  extent refs 1 gen 166914 flags 1
> [39710.420122]  extent data backref root 949 objectid 440675
> offset 3670016 count 1
> [39710.420123]  item 48 key (1668297723904 168 524288) itemoff 1398 itemsize 
> 53
> [39710.420123]  extent refs 1 gen 166914 flags 1
> [39710.420124]  extent data backref root 949 objectid 440675
> offset 4194304 count 1
> [39710.420125]  item 49 key (1668298248192 168 524288) itemoff 1345 itemsize 
> 53
> [39710.420125]  extent refs 1 gen 166914 flags 1
> [39710.420126]  extent data backref root 949 objectid 440675
> offset 4718592 count 1
> [39710.420126]  item 50 key (1668298772480 168 524288) itemoff 1292 itemsize 
> 53
> [39710.420127]  extent refs 1 gen 166914 flags 1
> [39710.420127]  extent data backref root 949 objectid 440675
> offset 5242880 count 1
> [39710.420128] BTRFS error (device sdc): unable to find ref byte nr
> 1668272218112 parent 0 root 949  owner 1032823 offset 655360
> [39710.420129] BTRFS: error (device sdc) in __btrfs_free_extent:6232:
> errno=-2 No such entry
> [39710.420131] BTRFS: error (device sdc) in
> btrfs_run_delayed_refs:2821: errno=-2 No such entry
> [39710.431108] pending csums is 5795840
>
> On Sat, Aug 15, 2015 at 8:51 AM, Timothy Normand Miller
>  wrote:
>> I didn't quite understand "profile and convert", since I can't find a
>> profile option.  Is this something your patch adds?
>>
>> Before I do that, however, I have to deal with this:
>>
>> compute0 ~ # btrfs device delete missing /mnt/btrfs
>> ERROR: error removing the device 'missing' - Input/output error
>>
>> [13058.298763] BTRFS warning (device sdc): csum failed ino 596 off
>> 623218688 csum 2756583412 expected csum 4104700738
>> [13058.298775] BTRFS warning (device sdc): csum failed ino 596 off
>> 623222784 csum 2568037276 expected csum 275151414
>> [13058.298782] BTRFS warning (device sdc): csum failed ino 596 off
>> 623226880 csum 2227564114 expected csum 3824181799
>> [13058.298788] BTRFS warning (device sdc): csum failed ino 596 off
>> 623230976 csum 3298529275 expected csum 1155389604
>> [13058.298794] BTRFS warning (device sdc): csum failed ino 596 off
>> 623235072 csum 2603391790 expected csum 1861925401
>> [13058.298801] BTRFS warning (device sdc): csum failed ino 596 off
>> 623239168 csum 2044148708 expected csum 3227559459
>> [13058.298807] BTRFS warning (device sdc): csum failed ino 596 off
>> 623243264 csum 615351306 expected csum 2720021058
>> [13058.329747] BTRFS warning (device sdc): csum failed ino 596 off
>> 623218688 csum 2756583412 expected csum 4104700738
>> [13058.329759] BTRFS warning (device sdc): csum failed ino 596 off
>> 623222784 csum 2568037276 expected csum 275151414
>> [13058.329770] BTRFS warning (device sdc): csum failed ino 596 off
>> 623226880 csum 2227564114 expected csum 3824181799
>>
>> Because of this, it won't delete the missing device.  How do I get
>> past this?  I'm pretty sure the problem is in some files I want to
>> delete anyhow.  Would deleting them solve the problem?
>>
>> On Sat, Aug 15, 2015 at 12:59 AM, Anand Jain  wrote:
>>>
>>>> BTW, when this is all over with, how do I make sure there are really
>>>> two copies of everything?  Will a scrub verify this?  Should I run a
>>>> balance operation?
>>>
>>> pls use 'btrfs bal profile and convert' to migrate single chunk (if any
>>> created when there were lesser number of RW-able devices) back to your
>>> desired raid1. Do this when all the devices are back online. Kindly note
>>> there is a bug in the btrfs VM that you won't be able to br

Re: "delete missing" with two missing devices doesn't delete both missing, only does a partial reconstruction

2015-08-15 Thread Timothy Normand Miller
Oh, it went read-only because it OOPSed:

[39710.419966] [ cut here ]
[39710.419969] WARNING: CPU: 1 PID: 5624 at
fs/btrfs/extent-tree.c:6226 __btrfs_free_extent+0x873/0xc80()
[39710.419970] Modules linked in: nfsd auth_rpcgss oid_registry
nfs_acl ipv6 binfmt_misc snd_hda_codec_hdmi snd_hda_codec_realtek
ppdev snd_hda_codec_generic x86_pkg_temp_thermal coretemp kvm_intel
snd_hda_intel snd_hda_controller kvm snd_hda_codec snd_hda_core
microcode snd_hwdep pcspkr snd_pcm snd_timer i2c_i801 snd lpc_ich
mfd_core parport_pc battery xts gf128mul aes_x86_64 cbc sha256_generic
libiscsi scsi_transport_iscsi tg3 ptp pps_core libphy sky2 r8169
pcnet32 mii e1000 bnx2 fuse nfs lockd grace sunrpc reiserfs multipath
linear raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx raid1 raid0 dm_snapshot dm_bufio dm_crypt dm_mirror
dm_region_hash dm_log dm_mod firewire_core hid_sunplus hid_sony
hid_samsung hid_pl hid_petalynx hid_gyration usbhid uhci_hcd
usb_storage ehci_pci
[39710.419991]  ehci_hcd aic94xx libsas qla2xxx megaraid_sas
megaraid_mbox megaraid_mm megaraid aacraid sx8 DAC960 cciss 3w_9xxx
3w_ mptsas scsi_transport_sas mptfc scsi_transport_fc mptspi
mptscsih mptbase atp870u dc395x qla1280 imm parport dmx3191d sym53c8xx
gdth advansys initio BusLogic arcmsr aic7xxx aic79xx
scsi_transport_spi sg sata_mv sata_sil24 sata_sil pata_marvell
[39710.420003] CPU: 1 PID: 5624 Comm: kworker/u8:7 Tainted: GW
  4.1.4-gentoo #1
[39710.420003] Hardware name: ECS H87H3-M/H87H3-M, BIOS 4.6.5 07/16/2013
[39710.420005] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper
[39710.420006]   8197e672 81794418

[39710.420008]  81049cbc 01846cc5e000 880064d12000
e000
[39710.420009]  fffe  8127bc03
000fc277
[39710.420010] Call Trace:
[39710.420012]  [] ? dump_stack+0x40/0x50
[39710.420014]  [] ? warn_slowpath_common+0x7c/0xb0
[39710.420015]  [] ? __btrfs_free_extent+0x873/0xc80
[39710.420018]  [] ? cpumask_next_and+0x30/0x50
[39710.420019]  [] ? enqueue_task_fair+0x2c3/0xdb0
[39710.420021]  [] ? btrfs_delayed_ref_lock+0x2c/0x260
[39710.420022]  [] ? __btrfs_run_delayed_refs+0x42c/0x1280
[39710.420024]  [] ? __sb_start_write+0x3d/0xe0
[39710.420025]  [] ? btrfs_run_delayed_refs.part.58+0x5e/0x270
[39710.420026]  [] ? delayed_ref_async_start+0x78/0x90
[39710.420028]  [] ? normal_work_helper+0x73/0x2a0
[39710.420029]  [] ? process_one_work+0x13c/0x3d0
[39710.420031]  [] ? worker_thread+0x63/0x480
[39710.420032]  [] ? process_one_work+0x3d0/0x3d0
[39710.420033]  [] ? kthread+0xce/0xf0
[39710.420034]  [] ? kthread_create_on_node+0x180/0x180
[39710.420036]  [] ? ret_from_fork+0x42/0x70
[39710.420037]  [] ? kthread_create_on_node+0x180/0x180
[39710.420038] ---[ end trace 0b4fe6057cd7a1a4 ]---

On Sat, Aug 15, 2015 at 9:13 AM, Timothy Normand Miller
 wrote:
> So I tried deleting the files that I think are the problem, and the
> file system went suddenly read-only, and I got this in dmesg:
>
> A bunch of these first messages:
> [39710.420118]  item 45 key (1668296151040 168 524288) itemoff 1557 itemsize 
> 53
> [39710.420118]  extent refs 1 gen 166914 flags 1
> [39710.420119]  extent data backref root 949 objectid 440675
> offset 2621440 count 1
> [39710.420120]  item 46 key (1668296675328 168 524288) itemoff 1504 itemsize 
> 53
> [39710.420120]  extent refs 1 gen 166914 flags 1
> [39710.420121]  extent data backref root 949 objectid 440675
> offset 3145728 count 1
> [39710.420121]  item 47 key (1668297199616 168 524288) itemoff 1451 itemsize 
> 53
> [39710.420122]  extent refs 1 gen 166914 flags 1
> [39710.420122]  extent data backref root 949 objectid 440675
> offset 3670016 count 1
> [39710.420123]  item 48 key (1668297723904 168 524288) itemoff 1398 itemsize 
> 53
> [39710.420123]  extent refs 1 gen 166914 flags 1
> [39710.420124]  extent data backref root 949 objectid 440675
> offset 4194304 count 1
> [39710.420125]  item 49 key (1668298248192 168 524288) itemoff 1345 itemsize 
> 53
> [39710.420125]  extent refs 1 gen 166914 flags 1
> [39710.420126]  extent data backref root 949 objectid 440675
> offset 4718592 count 1
> [39710.420126]  item 50 key (1668298772480 168 524288) itemoff 1292 itemsize 
> 53
> [39710.420127]  extent refs 1 gen 166914 flags 1
> [39710.420127]  extent data backref root 949 objectid 440675
> offset 5242880 count 1
> [39710.420128] BTRFS error (device sdc): unable to find ref byte nr
> 1668272218112 parent 0 root 949  owner 1032823 offset 655360
> [39710.420129] BTRFS: error (device sdc) in __btrfs_free_extent:6232:
> errno=-2 No such entry
> [39710.420131] BTRFS: error (device sdc) in
> btrfs_run_delayed_refs:2821: errno=-2 No s

Re: "delete missing" with two missing devices doesn't delete both missing, only does a partial reconstruction

2015-08-15 Thread Timothy Normand Miller
So I tried deleting the files that I think are the problem, and the
file system went suddenly read-only, and I got this in dmesg:

A bunch of these first messages:
[39710.420118]  item 45 key (1668296151040 168 524288) itemoff 1557 itemsize 53
[39710.420118]  extent refs 1 gen 166914 flags 1
[39710.420119]  extent data backref root 949 objectid 440675
offset 2621440 count 1
[39710.420120]  item 46 key (1668296675328 168 524288) itemoff 1504 itemsize 53
[39710.420120]  extent refs 1 gen 166914 flags 1
[39710.420121]  extent data backref root 949 objectid 440675
offset 3145728 count 1
[39710.420121]  item 47 key (1668297199616 168 524288) itemoff 1451 itemsize 53
[39710.420122]  extent refs 1 gen 166914 flags 1
[39710.420122]  extent data backref root 949 objectid 440675
offset 3670016 count 1
[39710.420123]  item 48 key (1668297723904 168 524288) itemoff 1398 itemsize 53
[39710.420123]  extent refs 1 gen 166914 flags 1
[39710.420124]  extent data backref root 949 objectid 440675
offset 4194304 count 1
[39710.420125]  item 49 key (1668298248192 168 524288) itemoff 1345 itemsize 53
[39710.420125]  extent refs 1 gen 166914 flags 1
[39710.420126]  extent data backref root 949 objectid 440675
offset 4718592 count 1
[39710.420126]  item 50 key (1668298772480 168 524288) itemoff 1292 itemsize 53
[39710.420127]  extent refs 1 gen 166914 flags 1
[39710.420127]  extent data backref root 949 objectid 440675
offset 5242880 count 1
[39710.420128] BTRFS error (device sdc): unable to find ref byte nr
1668272218112 parent 0 root 949  owner 1032823 offset 655360
[39710.420129] BTRFS: error (device sdc) in __btrfs_free_extent:6232:
errno=-2 No such entry
[39710.420131] BTRFS: error (device sdc) in
btrfs_run_delayed_refs:2821: errno=-2 No such entry
[39710.431108] pending csums is 5795840

On Sat, Aug 15, 2015 at 8:51 AM, Timothy Normand Miller
 wrote:
> I didn't quite understand "profile and convert", since I can't find a
> profile option.  Is this something your patch adds?
>
> Before I do that, however, I have to deal with this:
>
> compute0 ~ # btrfs device delete missing /mnt/btrfs
> ERROR: error removing the device 'missing' - Input/output error
>
> [13058.298763] BTRFS warning (device sdc): csum failed ino 596 off
> 623218688 csum 2756583412 expected csum 4104700738
> [13058.298775] BTRFS warning (device sdc): csum failed ino 596 off
> 623222784 csum 2568037276 expected csum 275151414
> [13058.298782] BTRFS warning (device sdc): csum failed ino 596 off
> 623226880 csum 2227564114 expected csum 3824181799
> [13058.298788] BTRFS warning (device sdc): csum failed ino 596 off
> 623230976 csum 3298529275 expected csum 1155389604
> [13058.298794] BTRFS warning (device sdc): csum failed ino 596 off
> 623235072 csum 2603391790 expected csum 1861925401
> [13058.298801] BTRFS warning (device sdc): csum failed ino 596 off
> 623239168 csum 2044148708 expected csum 3227559459
> [13058.298807] BTRFS warning (device sdc): csum failed ino 596 off
> 623243264 csum 615351306 expected csum 2720021058
> [13058.329747] BTRFS warning (device sdc): csum failed ino 596 off
> 623218688 csum 2756583412 expected csum 4104700738
> [13058.329759] BTRFS warning (device sdc): csum failed ino 596 off
> 623222784 csum 2568037276 expected csum 275151414
> [13058.329770] BTRFS warning (device sdc): csum failed ino 596 off
> 623226880 csum 2227564114 expected csum 3824181799
>
> Because of this, it won't delete the missing device.  How do I get
> past this?  I'm pretty sure the problem is in some files I want to
> delete anyhow.  Would deleting them solve the problem?
>
> On Sat, Aug 15, 2015 at 12:59 AM, Anand Jain  wrote:
>>
>>> BTW, when this is all over with, how do I make sure there are really
>>> two copies of everything?  Will a scrub verify this?  Should I run a
>>> balance operation?
>>
>> pls use 'btrfs bal profile and convert' to migrate single chunk (if any
>> created when there were lesser number of RW-able devices) back to your
>> desired raid1. Do this when all the devices are back online. Kindly note
>> there is a bug in the btrfs VM that you won't be able to bring a device
>> online with out unmount -> mount (I am working to fix). btrfs-progs will be
>> wrong in this case don't depend too much on that.
>> So to understand inside of btrfs kernel volume I generally use:
>> https://patchwork.kernel.org/patch/5816011/
>>
>> In there if bdev is null it indicates device is scanned but not part of VM
>> yet. Then unmount -> mount will bring device back to be part of VM.
>>
>>>> After applying Anand's patch, I was able to mount my 4-drive RAID1
>>>

Re: "delete missing" with two missing devices doesn't delete both missing, only does a partial reconstruction

2015-08-15 Thread Timothy Normand Miller
I didn't quite understand "profile and convert", since I can't find a
profile option.  Is this something your patch adds?

Before I do that, however, I have to deal with this:

compute0 ~ # btrfs device delete missing /mnt/btrfs
ERROR: error removing the device 'missing' - Input/output error

[13058.298763] BTRFS warning (device sdc): csum failed ino 596 off
623218688 csum 2756583412 expected csum 4104700738
[13058.298775] BTRFS warning (device sdc): csum failed ino 596 off
623222784 csum 2568037276 expected csum 275151414
[13058.298782] BTRFS warning (device sdc): csum failed ino 596 off
623226880 csum 2227564114 expected csum 3824181799
[13058.298788] BTRFS warning (device sdc): csum failed ino 596 off
623230976 csum 3298529275 expected csum 1155389604
[13058.298794] BTRFS warning (device sdc): csum failed ino 596 off
623235072 csum 2603391790 expected csum 1861925401
[13058.298801] BTRFS warning (device sdc): csum failed ino 596 off
623239168 csum 2044148708 expected csum 3227559459
[13058.298807] BTRFS warning (device sdc): csum failed ino 596 off
623243264 csum 615351306 expected csum 2720021058
[13058.329747] BTRFS warning (device sdc): csum failed ino 596 off
623218688 csum 2756583412 expected csum 4104700738
[13058.329759] BTRFS warning (device sdc): csum failed ino 596 off
623222784 csum 2568037276 expected csum 275151414
[13058.329770] BTRFS warning (device sdc): csum failed ino 596 off
623226880 csum 2227564114 expected csum 3824181799

Because of this, it won't delete the missing device.  How do I get
past this?  I'm pretty sure the problem is in some files I want to
delete anyhow.  Would deleting them solve the problem?

On Sat, Aug 15, 2015 at 12:59 AM, Anand Jain  wrote:
>
>> BTW, when this is all over with, how do I make sure there are really
>> two copies of everything?  Will a scrub verify this?  Should I run a
>> balance operation?
>
> pls use 'btrfs bal profile and convert' to migrate single chunk (if any
> created when there were lesser number of RW-able devices) back to your
> desired raid1. Do this when all the devices are back online. Kindly note
> there is a bug in the btrfs VM that you won't be able to bring a device
> online with out unmount -> mount (I am working to fix). btrfs-progs will be
> wrong in this case don't depend too much on that.
> So to understand inside of btrfs kernel volume I generally use:
> https://patchwork.kernel.org/patch/5816011/
>
> In there if bdev is null it indicates device is scanned but not part of VM
> yet. Then unmount -> mount will bring device back to be part of VM.
>
>>> After applying Anand's patch, I was able to mount my 4-drive RAID1
>>> and bring a new fourth drive online.
>
>>> However, something weird happened
>>> where the first "delete missing" only deleted one missing drive and
>>> only did a partial duplication.  I've posted a bug report here:
>
> that seems to be normal to me. unless I am missing something else / clarity.
>
>
> Thanks, Anand



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "delete missing" with two missing devices doesn't delete both missing, only does a partial reconstruction

2015-08-14 Thread Timothy Normand Miller
BTW, when this is all over with, how do I make sure there are really
two copies of everything?  Will a scrub verify this?  Should I run a
balance operation?

On Fri, Aug 14, 2015 at 11:29 PM, Timothy Normand Miller
 wrote:
> After applying Anand's patch, I was able to mount my 4-drive RAID1 and
> bring a new fourth drive online.  However, something weird happened
> where the first "delete missing" only deleted one missing drive and
> only did a partial duplication.  I've posted a bug report here:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=102901
>
> --
> Timothy Normand Miller, PhD
> Assistant Professor of Computer Science, Binghamton University
> http://www.cs.binghamton.edu/~millerti/
> Open Graphics Project



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


"delete missing" with two missing devices doesn't delete both missing, only does a partial reconstruction

2015-08-14 Thread Timothy Normand Miller
After applying Anand's patch, I was able to mount my 4-drive RAID1 and
bring a new fourth drive online.  However, something weird happened
where the first "delete missing" only deleted one missing drive and
only did a partial duplication.  I've posted a bug report here:

https://bugzilla.kernel.org/show_bug.cgi?id=102901

-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can't mount degraded. How to remove/add drives OFFLINE?

2015-08-14 Thread Timothy Normand Miller
I applied that patch to my 4.1.4, it mounted degraded, and now it's
balancing to the new drive.

Thanks for all the help!

On Fri, Aug 14, 2015 at 8:28 PM, Anand Jain  wrote:
>
>
>> Just to be clear, I removed the drive (the original failed drive) when
>> the power was off, then powered up, and then mounted degraded.  That's
>> not dangerous that I know of.
>
>
> patch has details. pls refer.
>>
>>
>> Where is this patch, and what kernel versions can this be applied to?
>
>
>
> https://patchwork.kernel.org/patch/7014141/
>
> its on 4.3. but should apply nice on below.
>
> thanks
> Anand



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can't mount degraded. How to remove/add drives OFFLINE?

2015-08-14 Thread Timothy Normand Miller
On Fri, Aug 14, 2015 at 7:49 PM, Anand Jain  wrote:
>

>>
>> - I had a drive fail, so I removed it and mounted degraded.
>
>
> that bit dangerous to do without the below patch. patch has more details
> why.

Just to be clear, I removed the drive (the original failed drive) when
the power was off, then powered up, and then mounted degraded.  That's
not dangerous that I know of.

>
>> - I hooked up a replacement drive, did an "add" on that one, and did a
>> "delete missing".
>> - During the rebalance, the replacement drive failed, there were OOPSes,
>> etc.
>> - Now, although all of my data is there, I can't mount degraded,
>> because btrfs is complaining that too many devices are missing (3 are
>> there, but it sees 2 missing).
>
>
>
> This is addressed in the patch
>
>   [PATCH 23/23] Btrfs: allow -o rw,degraded for single group profile
>

Where is this patch, and what kernel versions can this be applied to?



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can't mount degraded. How to remove/add drives OFFLINE?

2015-08-14 Thread Timothy Normand Miller
I'm not sure my situation is quite like the one you linked, so here's
my bug report:

https://bugzilla.kernel.org/show_bug.cgi?id=102881

On Fri, Aug 14, 2015 at 2:44 PM, Chris Murphy  wrote:
> On Fri, Aug 14, 2015 at 12:12 PM, Timothy Normand Miller
>  wrote:
>> Sorry about that empty email.  I hit a wrong key, and gmail decided to send.
>>
>> Anyhow, my replacement drive is going to arrive this evening, and I
>> need to know how to add it to my btrfs array.  Here's the situation:
>>
>> - I had a drive fail, so I removed it and mounted degraded.
>> - I hooked up a replacement drive, did an "add" on that one, and did a
>> "delete missing".
>> - During the rebalance, the replacement drive failed, there were OOPSes, etc.
>> - Now, although all of my data is there, I can't mount degraded,
>> because btrfs is complaining that too many devices are missing (3 are
>> there, but it sees 2 missing).
>
> It might be related to this (long) bug:
> https://bugzilla.kernel.org/show_bug.cgi?id=92641
>
> While Btrfs RAID 1 can tolerate only a single device failure, what you
> have is an in-progress rebuild of a missing device. If it becomes
> missing, the volume should be no worse off than it was before. But
> Btrfs doesn't see it this way, instead is sees this as two separate
> missing devices and now too many devices missing and it refuses to
> proceed. And there's no mechanism to remove missing devices unless you
> can mount rw. So it's stuck.
>
>
>> So I could use some help with cleaning up this mess.  All the data is
>> there, so I need to know how to either force it to mount degraded, or
>> add and remove devices offline.  Where do I begin?
>
> You can try to ask on IRC. I have no ideas for this scenario, I've
> tried and failed. My case was throw away, what should still be
> possible is using btrfs restore.
>
>
>> Also, doesn't it seem a bit arbitrary that there are "too many
>> missing," when all of the data is there?  If I understand correctly,
>> all four drives in my RAID1 should all have copies of the metadata,
>
> No that's not correct. RAID 1 means 2 copies of metadata. In a 4
> device RAID 1 that's still only 2 copies. It is not n-way RAID 1.
>
> But that doesn't matter here, the problem is that Btrfs has a narrow
> idea of the volume, it assumes without context that once the number of
> devices is below the minimum, the volume can't be mounted. In reality,
> an exception exists if the failure is for an in-progress rebuild of a
> missing drive. That drive failing should mean the volume is no worse
> off than before but Btrfs doesn't know that.
>
> Pretty sure about that anyway.
>
>
>> and of the remaining three good drives, there should be one or two
>> copies of every data block.  So it's all there, but btrfs has decided,
>> based on the NUMBER of missing devices, that it won't mount.
>> Shouldn't it refuse to mount if it knows there is data missing?  For
>> that matter, why should it even refuse in that case?  So some data
>> might missing, so it should throw some errors if you try to access
>> that missing data.  Right?
>
> I think no data is missing, no metadata is missing, and Btrfs is
> confused and stuck in this case.
>
> --
> Chris Murphy



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Can't mount degraded. How to remove/add drives OFFLINE?

2015-08-14 Thread Timothy Normand Miller
Sorry about that empty email.  I hit a wrong key, and gmail decided to send.

Anyhow, my replacement drive is going to arrive this evening, and I
need to know how to add it to my btrfs array.  Here's the situation:

- I had a drive fail, so I removed it and mounted degraded.
- I hooked up a replacement drive, did an "add" on that one, and did a
"delete missing".
- During the rebalance, the replacement drive failed, there were OOPSes, etc.
- Now, although all of my data is there, I can't mount degraded,
because btrfs is complaining that too many devices are missing (3 are
there, but it sees 2 missing).

So I could use some help with cleaning up this mess.  All the data is
there, so I need to know how to either force it to mount degraded, or
add and remove devices offline.  Where do I begin?

Also, doesn't it seem a bit arbitrary that there are "too many
missing," when all of the data is there?  If I understand correctly,
all four drives in my RAID1 should all have copies of the metadata,
and of the remaining three good drives, there should be one or two
copies of every data block.  So it's all there, but btrfs has decided,
based on the NUMBER of missing devices, that it won't mount.
Shouldn't it refuse to mount if it knows there is data missing?  For
that matter, why should it even refuse in that case?  So some data
might missing, so it should throw some errors if you try to access
that missing data.  Right?

Thanks!

-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Can't mount degraded. How to remove/add drives OFFLINE?

2015-08-14 Thread Timothy Normand Miller
My

-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-12 Thread Timothy Normand Miller
Ok, here's what's happening.  A few years ago, I took my old WD green
drives and put them in a box as backups to a new array of Seagate
drives.  When one of those seagate drives failed (just out of
warranty, of course), I replaced it with one of the WD's.  That was
cooking along just fine until just a few days ago when it started
throwing bad sectors and for some reason caused btrfs to have lots of
problems with the system block on the other three drives.  I tried to
add the other spare and remove the old spare, but for whatever reason,
this second spare (which had been fine when I boxed it in an
anti-static bag), is now failing catastrophically.  Now that that has
happened, the btrfs volume is stuck in a funny state where it won't
mount in degraded mode, because it thinks there should be five
devices, but there are only the original three.

I'm going to go ahead and order a new drive.  Meanwhile, is there a
way to add and remove drives from volumes that can't be mounted?


On Wed, Aug 12, 2015 at 4:48 PM, Timothy Normand Miller
 wrote:
> Actually, it didn't resume.  The "btrfs delete missing" was using 100%
> of the I/O bandwidth but wasn't actually doing any disk reads of
> writes.  I tried to reboot, but the system wouldn't go down, so after
> waiting 10 minutes, I power-cycled.  Now I can't mount at all and
> here's what dmesg says about that:
>
> [  236.118419] BTRFS info (device sdb): allowing degraded mounts
> [  236.118421] BTRFS info (device sdb): disk space caching is enabled
> [  236.165470] BTRFS: bdev (null) errs: wr 1724, rd 305, flush 45,
> corrupt 0, gen 2
> [  245.883595] BTRFS: too many missing devices, writeable mount is not allowed
> [  245.946570] BTRFS: open_ctree failed
>
> It thinks now that there should be five devices, and since there are
> only three available, it won't let me mount.
>
> # btrfs filesystem show
> Label: none  uuid: 49ac9ad2-b529-4e6e-aef9-1c5b9e8a72f8
> Total devices 1 FS bytes used 28.26GiB
> devid1 size 79.69GiB used 41.03GiB path /dev/sda3
>
> warning, device 1 is missing
> warning, device 1 is missing
> warning devid 1 not found already
> warning devid 5 not found already
> Label: none  uuid: ecdff84d-b4a2-4286-a1c1-cd7e5396901c
> Total devices 5 FS bytes used 1.46TiB
> devid2 size 931.51GiB used 767.00GiB path /dev/sdd
> devid3 size 931.51GiB used 745.03GiB path /dev/sdc
>     devid4 size 931.51GiB used 767.00GiB path /dev/sdb
> *** Some devices missing
>
> btrfs-progs v4.1.2
>
>
>
> On Wed, Aug 12, 2015 at 4:27 PM, Timothy Normand Miller
>  wrote:
>> It resumed on its own.  Weird.
>>
>> On Wed, Aug 12, 2015 at 4:23 PM, Timothy Normand Miller
>>  wrote:
>>> On Wed, Aug 12, 2015 at 2:10 PM, Chris Murphy  
>>> wrote:
>>>
>>>>
>>>> Anyway it looks like it's hardware related, but I don't know what
>>>> device ata4.00 is, so maybe this helps:
>>>> http://superuser.com/questions/617192/mapping-ata-device-number-to-logical-device-name
>>>
>>> # ata=4; ls -l /sys/block/sd* | grep $(grep $ata
>>> /sys/class/scsi_host/host*/unique_id | awk -F'/' '{print $5}')
>>> lrwxrwxrwx 1 root root 0 Aug 12 16:21 /sys/block/sde ->
>>> ../devices/pci:00/:00:1f.5/ata4/host3/target3:0:0/3:0:0:0/block/sde
>>>
>>> sde is the newly attached drive, replacing the one that had appeared
>>> to have bad sectors.  So it looks like either this new motherboard has
>>> a bad connector, or the cable is bad.  I'm going to swap it out for a
>>> different SATA cable.  How do I resume the failed operation?  And
>>> should I reboot because of the OOPSes?
>>>
>>> --
>>> Timothy Normand Miller, PhD
>>> Assistant Professor of Computer Science, Binghamton University
>>> http://www.cs.binghamton.edu/~millerti/
>>> Open Graphics Project
>>
>>
>>
>> --
>> Timothy Normand Miller, PhD
>> Assistant Professor of Computer Science, Binghamton University
>> http://www.cs.binghamton.edu/~millerti/
>> Open Graphics Project
>
>
>
> --
> Timothy Normand Miller, PhD
> Assistant Professor of Computer Science, Binghamton University
> http://www.cs.binghamton.edu/~millerti/
> Open Graphics Project



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-12 Thread Timothy Normand Miller
Actually, it didn't resume.  The "btrfs delete missing" was using 100%
of the I/O bandwidth but wasn't actually doing any disk reads of
writes.  I tried to reboot, but the system wouldn't go down, so after
waiting 10 minutes, I power-cycled.  Now I can't mount at all and
here's what dmesg says about that:

[  236.118419] BTRFS info (device sdb): allowing degraded mounts
[  236.118421] BTRFS info (device sdb): disk space caching is enabled
[  236.165470] BTRFS: bdev (null) errs: wr 1724, rd 305, flush 45,
corrupt 0, gen 2
[  245.883595] BTRFS: too many missing devices, writeable mount is not allowed
[  245.946570] BTRFS: open_ctree failed

It thinks now that there should be five devices, and since there are
only three available, it won't let me mount.

# btrfs filesystem show
Label: none  uuid: 49ac9ad2-b529-4e6e-aef9-1c5b9e8a72f8
Total devices 1 FS bytes used 28.26GiB
devid1 size 79.69GiB used 41.03GiB path /dev/sda3

warning, device 1 is missing
warning, device 1 is missing
warning devid 1 not found already
warning devid 5 not found already
Label: none  uuid: ecdff84d-b4a2-4286-a1c1-cd7e5396901c
Total devices 5 FS bytes used 1.46TiB
devid2 size 931.51GiB used 767.00GiB path /dev/sdd
devid3 size 931.51GiB used 745.03GiB path /dev/sdc
devid4 size 931.51GiB used 767.00GiB path /dev/sdb
*** Some devices missing

btrfs-progs v4.1.2



On Wed, Aug 12, 2015 at 4:27 PM, Timothy Normand Miller
 wrote:
> It resumed on its own.  Weird.
>
> On Wed, Aug 12, 2015 at 4:23 PM, Timothy Normand Miller
>  wrote:
>> On Wed, Aug 12, 2015 at 2:10 PM, Chris Murphy  
>> wrote:
>>
>>>
>>> Anyway it looks like it's hardware related, but I don't know what
>>> device ata4.00 is, so maybe this helps:
>>> http://superuser.com/questions/617192/mapping-ata-device-number-to-logical-device-name
>>
>> # ata=4; ls -l /sys/block/sd* | grep $(grep $ata
>> /sys/class/scsi_host/host*/unique_id | awk -F'/' '{print $5}')
>> lrwxrwxrwx 1 root root 0 Aug 12 16:21 /sys/block/sde ->
>> ../devices/pci:00/:00:1f.5/ata4/host3/target3:0:0/3:0:0:0/block/sde
>>
>> sde is the newly attached drive, replacing the one that had appeared
>> to have bad sectors.  So it looks like either this new motherboard has
>> a bad connector, or the cable is bad.  I'm going to swap it out for a
>> different SATA cable.  How do I resume the failed operation?  And
>> should I reboot because of the OOPSes?
>>
>> --
>> Timothy Normand Miller, PhD
>> Assistant Professor of Computer Science, Binghamton University
>> http://www.cs.binghamton.edu/~millerti/
>> Open Graphics Project
>
>
>
> --
> Timothy Normand Miller, PhD
> Assistant Professor of Computer Science, Binghamton University
> http://www.cs.binghamton.edu/~millerti/
> Open Graphics Project



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-12 Thread Timothy Normand Miller
It resumed on its own.  Weird.

On Wed, Aug 12, 2015 at 4:23 PM, Timothy Normand Miller
 wrote:
> On Wed, Aug 12, 2015 at 2:10 PM, Chris Murphy  wrote:
>
>>
>> Anyway it looks like it's hardware related, but I don't know what
>> device ata4.00 is, so maybe this helps:
>> http://superuser.com/questions/617192/mapping-ata-device-number-to-logical-device-name
>
> # ata=4; ls -l /sys/block/sd* | grep $(grep $ata
> /sys/class/scsi_host/host*/unique_id | awk -F'/' '{print $5}')
> lrwxrwxrwx 1 root root 0 Aug 12 16:21 /sys/block/sde ->
> ../devices/pci:00/:00:1f.5/ata4/host3/target3:0:0/3:0:0:0/block/sde
>
> sde is the newly attached drive, replacing the one that had appeared
> to have bad sectors.  So it looks like either this new motherboard has
> a bad connector, or the cable is bad.  I'm going to swap it out for a
> different SATA cable.  How do I resume the failed operation?  And
> should I reboot because of the OOPSes?
>
> --
> Timothy Normand Miller, PhD
> Assistant Professor of Computer Science, Binghamton University
> http://www.cs.binghamton.edu/~millerti/
> Open Graphics Project



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-12 Thread Timothy Normand Miller
On Wed, Aug 12, 2015 at 2:10 PM, Chris Murphy  wrote:

>
> Anyway it looks like it's hardware related, but I don't know what
> device ata4.00 is, so maybe this helps:
> http://superuser.com/questions/617192/mapping-ata-device-number-to-logical-device-name

# ata=4; ls -l /sys/block/sd* | grep $(grep $ata
/sys/class/scsi_host/host*/unique_id | awk -F'/' '{print $5}')
lrwxrwxrwx 1 root root 0 Aug 12 16:21 /sys/block/sde ->
../devices/pci:00/:00:1f.5/ata4/host3/target3:0:0/3:0:0:0/block/sde

sde is the newly attached drive, replacing the one that had appeared
to have bad sectors.  So it looks like either this new motherboard has
a bad connector, or the cable is bad.  I'm going to swap it out for a
different SATA cable.  How do I resume the failed operation?  And
should I reboot because of the OOPSes?

-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-12 Thread Timothy Normand Miller
I added a new device and then did a delete missing.  I lost the
terminal (should have used gnu screen), so I didn't see the stdout,
but the operation aborted at some point.  There's ton of output in
dmesg related to this, along with some OOPSes, which I have attached
as "dmesg2" here:

https://bugzilla.kernel.org/show_bug.cgi?id=102691


-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-11 Thread Timothy Normand Miller
On Tue, Aug 11, 2015 at 5:24 PM, Chris Murphy  wrote:

>> There is still data redundancy.  Will a scrub at least notice that the
>> copies differ?
>
> No, that's what I mean by "nodatasum means no raid1 self-healing is
> possible". You have data redundancy, but without checksums btrfs has
> no way to know if they differ. It doesn't do two reads and compares
> them, it's just like md raid, it picks one device, and so long as
> there's no read error from the device, that copy of the data is
> assumed to be good.

Ok, that makes sense.  I'm guessing it wouldn't be worth it to add a
feature like this because (a) few people use nodatacow or end up in my
situation, and (b) if they did, and the two copies were inconsistent,
what would you do?  I suppose for me, it would be nice to know which
files were affected.


-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-11 Thread Timothy Normand Miller
On Tue, Aug 11, 2015 at 4:48 PM, Chris Murphy  wrote:

>
> The compress is ignored, and it looks like nodatasum and nodatacow
> apply to everything. The nodatasum means no raid1 self-healing is
> possible for any data on the entire volume. Metadata checksumming is
> still enabled.

Ugh.  So I need to change my fstab file.  I swear, some expert on IRC
told me that this should work fine, which is why I did it.  In fact, I
think they recommended it on the basis that I wanted to put VM images
on one of the subvolumes.  This discussion occurred a long time ago,
well before RAID5 was even partially implemented.

There is still data redundancy.  Will a scrub at least notice that the
copies differ?


-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-11 Thread Timothy Normand Miller
On Tue, Aug 11, 2015 at 3:57 PM, Chris Murphy  wrote:
> On Tue, Aug 11, 2015 at 12:04 PM, Timothy Normand Miller
>  wrote:
>
>> https://bugzilla.kernel.org/show_bug.cgi?id=102691
>
> [7.729124] BTRFS: device fsid ecdff84d-b4a2-4286-a1c1-cd7e5396901c
> devid 2 transid 226237 /dev/sdd
> [7.746115] BTRFS: device fsid ecdff84d-b4a2-4286-a1c1-cd7e5396901c
> devid 4 transid 226237 /dev/sdb
> [7.826493] BTRFS: device fsid ecdff84d-b4a2-4286-a1c1-cd7e5396901c
> devid 3 transid 226237 /dev/sdc
>
> What do you get for 'btrfs fi show'

# btrfs fi show
Label: none  uuid: 49ac9ad2-b529-4e6e-aef9-1c5b9e8a72f8
Total devices 1 FS bytes used 28.33GiB
devid1 size 79.69GiB used 41.03GiB path /dev/sda3

Label: none  uuid: ecdff84d-b4a2-4286-a1c1-cd7e5396901c
Total devices 4 FS bytes used 1.46TiB
devid2 size 931.51GiB used 767.00GiB path /dev/sdd
devid3 size 931.51GiB used 760.03GiB path /dev/sdc
devid4 size 931.51GiB used 767.00GiB path /dev/sdb
*** Some devices missing

Label: none  uuid: f9331766-e50a-43d5-98dc-fabf5c68321d
Total devices 1 FS bytes used 2.99TiB
devid1 size 3.64TiB used 3.01TiB path /dev/sde1

btrfs-progs v4.1.2

>
> I see devid 2, 3, 4 only for this volume UUID. So you definitely
> appear to have a failed device and that's why it doesn't mount
> automatically at boot time. You just need to use -o degraded, and that
> should work assuming no problems with the other three devices. If it
> does work, 'btrfs replace start...' is the ideal way to replace the
> failed drive.

It's missing because I physically disconnected it.  Someone on IRC
suggested I try this in case the drive with the bad sector was
interfering.  Of course, now that I've done this and mounted
read/write, we can't reintegrate the failing drive.

If I lose the array, I won't cry.  The backup appears to be complete.
But it would be convenient to avoid having to restore from scratch,
and I'm hoping this might help you guys too in some way.  I really
like btrfs, and I would like provide you with whatever info might
contribute something.

>
> Maybe someone else can say whether nodatacow as a subvolume mount
> option will apply this to the entire volume.

At the moment, I'm only trying to mount the whole volume, just so I
could recover and scrub it, although as I mentioned in my earlier
email, the scrub aborts with no report of why and with "0 errors."



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-11 Thread Timothy Normand Miller
On Tue, Aug 11, 2015 at 3:47 PM, Chris Murphy  wrote:

>
> Huh. I thought nodatacow applies to an entire volume only, not per
> subvolume unless you use chattr +C (in which case it can be per
> subvolume, directory or per file). I could be confused, but I think
> you have mutually exclusive mount options.

Well, at the time I set up this system, I asked on IRC, and people
said it should work.  I've never seen any errors from this.


>>
>> [94312.091613] BTRFS info (device sdc): allowing degraded mounts
>> [94312.091618] BTRFS info (device sdc): disk space caching is enabled
>> [94312.194513] BTRFS: bdev (null) errs: wr 1724, rd 305, flush 45,
>> corrupt 0, gen 2
>> [94319.824563] BTRFS: checking UUID tree
>
> I don't see any mount failure message. It worked then?

Yes and no.  It's mounted, but a scrub aborts silently:

# btrfs scrub status /mnt/btrfs/
scrub status for ecdff84d-b4a2-4286-a1c1-cd7e5396901c
scrub started at Tue Aug 11 13:56:36 2015 and was aborted after 01:31:55
total bytes scrubbed: 2.19TiB with 0 errors

No new messages appeared in dmesg, so I can't tell why it aborted.
It's also odd that it reports zero errors, given that it aborted.



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-11 Thread Timothy Normand Miller
On Tue, Aug 11, 2015 at 1:56 PM, Timothy Normand Miller
 wrote:
> On Tue, Aug 11, 2015 at 12:21 AM, Chris Murphy  
> wrote:

>> The entire dmesg is still useful because it should show libata errors
>> if these aren't fully failed drives. So you should file a bug and
>> include, literally, the entire unedited dmesg.
>
> Alright, I'll do that.  Thanks!
>

Here you go:

https://bugzilla.kernel.org/show_bug.cgi?id=102691

-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-11 Thread Timothy Normand Miller
On Tue, Aug 11, 2015 at 12:21 AM, Chris Murphy  wrote:
> On Mon, Aug 10, 2015 at 7:23 PM, Timothy Normand Miller
>  wrote:
>> On Mon, Aug 10, 2015 at 6:52 PM, Chris Murphy  
>> wrote:
>
>>> - complete dmesg for the failed mount
>>
>> It really doesn't say much.  I have things like this:
>> [8.643535] BTRFS info (device sdc): disk space caching is enabled
>> [8.643789] BTRFS: failed to read the system array on sdc
>> [8.706062] BTRFS: open_ctree failed
>> [8.707124] BTRFS info (device sdc): disk space caching is enabled
>> [8.710924] BTRFS: failed to read the system array on sdc
>> [8.766080] BTRFS: open_ctree failed
>> [8.766903] BTRFS info (device sdc): setting nodatacow, compression 
>> disabled
>> [8.766905] BTRFS info (device sdc): disk space caching is enabled
>> [8.767152] BTRFS: failed to read the system array on sdc
>> [8.936019] BTRFS: open_ctree failed
>> [8.936906] BTRFS info (device sdc): disk space caching is enabled
>> [8.939922] BTRFS: failed to read the system array on sdc
>> [8.995984] BTRFS: open_ctree failed
>> [8.996796] BTRFS info (device sdc): disk space caching is enabled
>> [8.997093] BTRFS: failed to read the system array on sdc
>> [9.125936] BTRFS: open_ctree failed
>
> It looks like there's not enough redundancy remaining to mount and in
> such a case there's really not much to be done.
>
> I don't see nodatacow in your fstab, so I don't know why that's
> happening. That means no checksumming for data.

Sorry.  I was dumb.  I only showed you the entry for what I was trying
to mount manually.  I have subvolumes, and this is what is in my
fstab:

UUID=ecdff84d-b4a2-4286-a1c1-cd7e5396901c /home btrfs
compress=lzo,noatime,space_cache,subvol=home 0 2
UUID=ecdff84d-b4a2-4286-a1c1-cd7e5396901c /mnt/btrfs btrfs
compress=lzo,noatime,space_cache 0 2
UUID=ecdff84d-b4a2-4286-a1c1-cd7e5396901c /mnt/vms btrfs
noatime,nodatacow,space_cache,subvol=vms 0 2
UUID=ecdff84d-b4a2-4286-a1c1-cd7e5396901c /mnt/oldfiles btrfs
compress=lzo,noatime,space_cache,subvol=oldfiles 0 2
UUID=ecdff84d-b4a2-4286-a1c1-cd7e5396901c /mnt/backup btrfs
compress=lzo,noatime,space_cache,subvol=backup 0 2


>
>
>>
>> Also, when I manually try to mount, I get things like this:
>>
>> # mount /mnt/btrfs
>> mount: wrong fs type, bad option, bad superblock on /dev/sdc,
>>missing codepage or helper program, or other error
>
> Have you tried to mount with -o degraded?

Ooh!  I can do that!

Mounting ro,degraded, I see this:

[94197.902443] BTRFS info (device sdc): allowing degraded mounts
[94197.902448] BTRFS info (device sdc): disk space caching is enabled
[94198.240621] BTRFS: bdev (null) errs: wr 1724, rd 305, flush 45,
corrupt 0, gen 2

Mounting rw,degraded, I see this:

[94312.091613] BTRFS info (device sdc): allowing degraded mounts
[94312.091618] BTRFS info (device sdc): disk space caching is enabled
[94312.194513] BTRFS: bdev (null) errs: wr 1724, rd 305, flush 45,
corrupt 0, gen 2
[94319.824563] BTRFS: checking UUID tree


>
>
>
>> Well, if I get something lengthy, I'll attach it to my bug report.
>> Did the information I reported help at all?
>
> The entire dmesg is still useful because it should show libata errors
> if these aren't fully failed drives. So you should file a bug and
> include, literally, the entire unedited dmesg.

Alright, I'll do that.  Thanks!

>
>
> --
> Chris Murphy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-10 Thread Timothy Normand Miller
On Mon, Aug 10, 2015 at 6:52 PM, Chris Murphy  wrote:
> Four needed things:
> - kernel version

4.1.0-gentoo-r1, although I have also tried 4.1.4.

> - btrfs-progs version

4.1.2

> - complete dmesg for the failed mount

It really doesn't say much.  I have things like this:
[8.643535] BTRFS info (device sdc): disk space caching is enabled
[8.643789] BTRFS: failed to read the system array on sdc
[8.706062] BTRFS: open_ctree failed
[8.707124] BTRFS info (device sdc): disk space caching is enabled
[8.710924] BTRFS: failed to read the system array on sdc
[8.766080] BTRFS: open_ctree failed
[8.766903] BTRFS info (device sdc): setting nodatacow, compression disabled
[8.766905] BTRFS info (device sdc): disk space caching is enabled
[8.767152] BTRFS: failed to read the system array on sdc
[8.936019] BTRFS: open_ctree failed
[8.936906] BTRFS info (device sdc): disk space caching is enabled
[8.939922] BTRFS: failed to read the system array on sdc
[8.995984] BTRFS: open_ctree failed
[8.996796] BTRFS info (device sdc): disk space caching is enabled
[8.997093] BTRFS: failed to read the system array on sdc
[9.125936] BTRFS: open_ctree failed

Also, when I manually try to mount, I get things like this:

# mount /mnt/btrfs
mount: wrong fs type, bad option, bad superblock on /dev/sdc,
   missing codepage or helper program, or other error

   In some cases useful info is found in syslog - try
   dmesg | tail or so.

For this fstab entry:
UUID=ecdff84d-b4a2-4286-a1c1-cd7e5396901c /mnt/btrfs btrfs
compress=lzo,noatime,space_cache 0 2

# mount -t btrfs /dev/sdd /mnt/btrfs
mount: wrong fs type, bad option, bad superblock on /dev/sdd,
   missing codepage or helper program, or other error

   In some cases useful info is found in syslog - try
   dmesg | tail or so.


> - complete btrfs check output (you mostly have this but since the
> version isn't included, it's not clear this is the entire output)

I pasted it all.

>
> The last two can be included as attachments in a bugzilla.kernel.org
> bug report and the URL posted in this thread. Typically MUA wrapping
> nerfs the dmesg making it hard to read, so attachments to a bug report
> are better.

Well, if I get something lengthy, I'll attach it to my bug report.
Did the information I reported help at all?  I think that btrfs just
isn't being informative about the problem.  Are there other commands I
can run to get more detailed reports?

BTW, I tried disconnecting the drive with the bad sector.  I still get
all the same errors and can't repair.

>
> Bugs get reported both in bugzilla and on the list.
> https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#How_do_I_report_bugs_and_issues.3F
>
> Sometimes it takes a while for devs to respond, they also get worked
> on even without responses just because there's so many improvements
> each release.
>
>
> --
> Chris Murphy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-10 Thread Timothy Normand Miller
Hi, everyone,

I have a four-drive RAID1 array, and since yesterday, some problem has
rendered it unmountable (read/write anyhow).  One drive reports a read
error, so maybe the drive is failing, but I've had that happen before,
and it was easy to swap in a new drive.  This time, two more drives
are reporting that they "failed to read the system array."  I managed
to mount it read-only (by specifying the node of the fourth drive) and
rsync everything to a backup drive.  Now I'd like to try to repair.
This is where I'm running into problems.  Since I can't mount it
read-write, I can't do a scrub, so I tried "btrfs check --repair", and
this is what I got:

# btrfs check --repair /dev/sde
enabling repair mode
Checking filesystem on /dev/sde
UUID: ecdff84d-b4a2-4286-a1c1-cd7e5396901c
checking extents
ref mismatch on [1667931533312 524288] extent item 1, found 2
attempting to repair backref discrepency for bytenr 1667931533312
Ref doesn't match the record start and is compressed, please take a
btrfs-image of this file system and send it to a btrfs developer so
they can complete this functionality for bytenr 1667931639808
failed to repair damaged filesystem, aborting

Since this specifically told me to contact a developer, I figured this
is something you guys want to know about.  :)

Also, I was wondering if perhaps someone can help me figure out how to
repair it.

There are only two files that appear to be unrecoverable when I rsync,
and I can restore those from an earlier backup.  Since I can't mount
read/write, I can't go and delete those files, so I seem to be stuck.



BTRFS works beautifully with single drive configurations.  I have
multiple, and I've never had a problem.  On the other hand seem to
have LOTS of trouble with 4-drive RAID1.  I get OOPSes regularly.
I've tried reporting them on bugzilla.kernel.org, but it doesn't
appear that btrfs devs actually use that.  Is this list a better place
to report those?


Thanks for the help!

-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
Open Graphics Project
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html