Re: btrfs-image gets stuck, using 100%, looping on bad file descriptor

2015-08-20 Thread Timothy Normand Miller
On Thu, Aug 20, 2015 at 7:38 AM, Austin S Hemmelgarn
ahferro...@gmail.com wrote:

 Just for reference, I've found that it is usually safer to delete the
 missing device first if possible, then add the new one and re-balance. There
 seem to be some edge-cases in the code for deleting missing devices.


The problem is that you can't do that if there's not enough space on
the remaining devices to hold all the data.


-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-image gets stuck, using 100%, looping on bad file descriptor

2015-08-19 Thread Timothy Normand Miller
On Wed, Aug 19, 2015 at 1:22 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote:


 Timothy Normand Miller wrote on 2015/08/18 22:55 -0400:

 On Tue, Aug 18, 2015 at 10:48 PM, Qu Wenruo quwen...@cn.fujitsu.com
 wrote:



 Timothy Normand Miller wrote on 2015/08/18 22:46 -0400:


 On Tue, Aug 18, 2015 at 9:32 PM, Qu Wenruo quwen...@cn.fujitsu.com
 wrote:


 Hi Timothy,

 Although I have replied to the bugzilla, IMHO it's more appropriate to
 discuss it in mail list, as it's not a kernel bug.


 All four devices were online.  The missing one was a drive that
 died, which was replaced by a new one, but btrfs wouldn't finish the
 deletion of the missing device.

 By replaced, did you mean btrfs replace? Or just change the physical
 disk
 without using btrfs replace?


 Here's what happened:

 - A drive started throwing bad sectors.  Somehow this caused metadata
 on other drives to get messed up.


 Did that cause any huge damage?

It seems that metadata was damaged on all drives.


 - I took that drive offline and mounted degraded (it's a 4-drive RAID1)
 - I did a btrfs add on a new drive and then a btrfs delete missing
 - The replacement drive failed during the replacement operation, and
 everything went to crap.
 - With some help, I got a kernel patch that allowed me to mount the
 original three drives with TWO missing devices.


 So the original 3 drives are still OK,
 original bad one is missing, and the newly add one is also missing?

 That sounds quite repairable.

Nothing I tried would run to completion.  There were always errors.


 - I added a brand new drive and then did delete missing again.  This
 time, the first delete missing was successful, but it didn't fully
 balance the drives, and there was another missing device, so I had to
 do a delete missing again, and that failed.

 I wanted to get this back online and restored from a backup, but I was
 willing to keep it this way if people wanted to probe at, in case we
 can uncover any btrfs bugs.  So it was suggested to get a metadata
 image, but that ran into some kind of bug in btrfs-image.

 If btrfs-image doesn't work, you can also try btrfs-debug-tree.
 IIRC, debug-tree should be more robust than btrfs-image.

 BTW, have you tried btrfsck on it? Does it also cause the infinite loop?

 I'll also try to reproduce it and investigate the codes directly.

Well, I had to get things back online, so I've restored from backup.
I do have what limited metadata image I could get from btrfs-image.


 Thanks,
 Qu


 Currently, I'm restoring from backup, but I have at least a partial
 metadata dump.






-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: chattr +C on subvolume

2015-08-18 Thread Timothy Normand Miller
Another weird thing I've noticed.  I did this:

chattr +C /mnt/btrfs/vms

But both of these report nothing:

lsattr /mnt/btrfs/vms
lsattr /mnt/vms

Shouldn't at least one show the C attribute?


On Tue, Aug 18, 2015 at 1:36 PM, Timothy Normand Miller
theo...@gmail.com wrote:
 Maybe this is a dumb question, but there are always corner cases.

 I have a subvolume where I want to disable CoW for VM disks.  Maybe
 that's a dumb idea, but that's a recommendation I've seen here and
 there.  Now, in the docs I've seen, +C applies to a directory.  Does
 it apply to subvolumes?  And do I apply it to the subvolume within the
 main volume, or do I apply it to the mount point where I've mounted
 the subvolume separately?  Are there any cases where the flag applies
 or not depending on how you access the files?

 The same subvolume for me is accessible via /mnt/btrfs/vms (via the
 /mnt/btrfs mount point) and /mnt/vms (where the subvolume is mounted).
 I applied +C to /mnt/btrfs/vms.  So what I'm trying to find out is if
 it also applies when files are accessed via /mnt/vms.

 Thanks.


 --
 Timothy Normand Miller, PhD
 Assistant Professor of Computer Science, Binghamton University
 http://www.cs.binghamton.edu/~millerti/
 Open Graphics Project



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


chattr +C on subvolume

2015-08-18 Thread Timothy Normand Miller
Maybe this is a dumb question, but there are always corner cases.

I have a subvolume where I want to disable CoW for VM disks.  Maybe
that's a dumb idea, but that's a recommendation I've seen here and
there.  Now, in the docs I've seen, +C applies to a directory.  Does
it apply to subvolumes?  And do I apply it to the subvolume within the
main volume, or do I apply it to the mount point where I've mounted
the subvolume separately?  Are there any cases where the flag applies
or not depending on how you access the files?

The same subvolume for me is accessible via /mnt/btrfs/vms (via the
/mnt/btrfs mount point) and /mnt/vms (where the subvolume is mounted).
I applied +C to /mnt/btrfs/vms.  So what I'm trying to find out is if
it also applies when files are accessed via /mnt/vms.

Thanks.


-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: chattr +C on subvolume

2015-08-18 Thread Timothy Normand Miller
Never mind on that last lsattr question.  I needed a -d option.  Silly me.  :)

On Tue, Aug 18, 2015 at 1:39 PM, Timothy Normand Miller
theo...@gmail.com wrote:
 Another weird thing I've noticed.  I did this:

 chattr +C /mnt/btrfs/vms

 But both of these report nothing:

 lsattr /mnt/btrfs/vms
 lsattr /mnt/vms

 Shouldn't at least one show the C attribute?


 On Tue, Aug 18, 2015 at 1:36 PM, Timothy Normand Miller
 theo...@gmail.com wrote:
 Maybe this is a dumb question, but there are always corner cases.

 I have a subvolume where I want to disable CoW for VM disks.  Maybe
 that's a dumb idea, but that's a recommendation I've seen here and
 there.  Now, in the docs I've seen, +C applies to a directory.  Does
 it apply to subvolumes?  And do I apply it to the subvolume within the
 main volume, or do I apply it to the mount point where I've mounted
 the subvolume separately?  Are there any cases where the flag applies
 or not depending on how you access the files?

 The same subvolume for me is accessible via /mnt/btrfs/vms (via the
 /mnt/btrfs mount point) and /mnt/vms (where the subvolume is mounted).
 I applied +C to /mnt/btrfs/vms.  So what I'm trying to find out is if
 it also applies when files are accessed via /mnt/vms.

 Thanks.


 --
 Timothy Normand Miller, PhD
 Assistant Professor of Computer Science, Binghamton University
 http://www.cs.binghamton.edu/~millerti/
 Open Graphics Project



 --
 Timothy Normand Miller, PhD
 Assistant Professor of Computer Science, Binghamton University
 http://www.cs.binghamton.edu/~millerti/
 Open Graphics Project



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, wipe it out and start over or keep debugging?

2015-08-18 Thread Timothy Normand Miller
I was doing it on an unmounted volume anyhow.

On Tue, Aug 18, 2015 at 5:09 PM, Chris Murphy li...@colorremedies.com wrote:
 On Tue, Aug 18, 2015 at 5:21 AM, Austin S Hemmelgarn
 ahferro...@gmail.com wrote:
 On 2015-08-17 14:52, Timothy Normand Miller wrote:

 I'm not sure if I'm doing this wrong.  Here's what I'm seeing:

 # btrfs-image -c9 -t4 -w /mnt/btrfs ~/btrfs_dump.z
 Superblock bytenr is larger than device size
 Open ctree failed
 create failed (No such file or directory)


 For the source, you need to specify the underlying block device, not the top
 of the mounted filesystem.  It's trying to read the directory as a block
 device and getting very confused.  We should probably add some kind of check
 to btrfs-image to warn about that.

 Should it even be possible to use btrfs-image on a mounted volume? If
 it's written to at all, the collected image is going to be
 inconsistent.

 --
 Chris Murphy



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-image gets stuck, using 100%, looping on bad file descriptor

2015-08-18 Thread Timothy Normand Miller
On Tue, Aug 18, 2015 at 10:48 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote:


 Timothy Normand Miller wrote on 2015/08/18 22:46 -0400:

 On Tue, Aug 18, 2015 at 9:32 PM, Qu Wenruo quwen...@cn.fujitsu.com
 wrote:

 Hi Timothy,

 Although I have replied to the bugzilla, IMHO it's more appropriate to
 discuss it in mail list, as it's not a kernel bug.


 All four devices were online.  The missing one was a drive that
 died, which was replaced by a new one, but btrfs wouldn't finish the
 deletion of the missing device.

 By replaced, did you mean btrfs replace? Or just change the physical disk
 without using btrfs replace?

Here's what happened:

- A drive started throwing bad sectors.  Somehow this caused metadata
on other drives to get messed up.
- I took that drive offline and mounted degraded (it's a 4-drive RAID1)
- I did a btrfs add on a new drive and then a btrfs delete missing
- The replacement drive failed during the replacement operation, and
everything went to crap.
- With some help, I got a kernel patch that allowed me to mount the
original three drives with TWO missing devices.
- I added a brand new drive and then did delete missing again.  This
time, the first delete missing was successful, but it didn't fully
balance the drives, and there was another missing device, so I had to
do a delete missing again, and that failed.

I wanted to get this back online and restored from a backup, but I was
willing to keep it this way if people wanted to probe at, in case we
can uncover any btrfs bugs.  So it was suggested to get a metadata
image, but that ran into some kind of bug in btrfs-image.

Currently, I'm restoring from backup, but I have at least a partial
metadata dump.


-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-image gets stuck, using 100%, looping on bad file descriptor

2015-08-18 Thread Timothy Normand Miller
On Tue, Aug 18, 2015 at 9:32 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote:
 Hi Timothy,

 Although I have replied to the bugzilla, IMHO it's more appropriate to
 discuss it in mail list, as it's not a kernel bug.


All four devices were online.  The missing one was a drive that
died, which was replaced by a new one, but btrfs wouldn't finish the
deletion of the missing device.

-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, wipe it out and start over or keep debugging?

2015-08-18 Thread Timothy Normand Miller
I ran the following command.  It spent a lot of time creating a
1672450048 byte file.  Then it stopped writing to the file and started
using 100% CPU.  It's currently doing no I/O, and it's been doing that
for a while now.  Is that supposed to happen?

On Tue, Aug 18, 2015 at 9:30 AM, Timothy Normand Miller
theo...@gmail.com wrote:
 In that case, do I need to do all four block devices separately, or
 will the tool figure it out?

 On Tue, Aug 18, 2015 at 7:21 AM, Austin S Hemmelgarn
 ahferro...@gmail.com wrote:
 On 2015-08-17 14:52, Timothy Normand Miller wrote:

 I'm not sure if I'm doing this wrong.  Here's what I'm seeing:

 # btrfs-image -c9 -t4 -w /mnt/btrfs ~/btrfs_dump.z
 Superblock bytenr is larger than device size
 Open ctree failed
 create failed (No such file or directory)


 For the source, you need to specify the underlying block device, not the top
 of the mounted filesystem.  It's trying to read the directory as a block
 device and getting very confused.  We should probably add some kind of check
 to btrfs-image to warn about that.





 --
 Timothy Normand Miller, PhD
 Assistant Professor of Computer Science, Binghamton University
 http://www.cs.binghamton.edu/~millerti/
 Open Graphics Project



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, wipe it out and start over or keep debugging?

2015-08-18 Thread Timothy Normand Miller
In that case, do I need to do all four block devices separately, or
will the tool figure it out?

On Tue, Aug 18, 2015 at 7:21 AM, Austin S Hemmelgarn
ahferro...@gmail.com wrote:
 On 2015-08-17 14:52, Timothy Normand Miller wrote:

 I'm not sure if I'm doing this wrong.  Here's what I'm seeing:

 # btrfs-image -c9 -t4 -w /mnt/btrfs ~/btrfs_dump.z
 Superblock bytenr is larger than device size
 Open ctree failed
 create failed (No such file or directory)


 For the source, you need to specify the underlying block device, not the top
 of the mounted filesystem.  It's trying to read the directory as a block
 device and getting very confused.  We should probably add some kind of check
 to btrfs-image to warn about that.





-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs-image gets stuck, using 100%, looping on bad file descriptor

2015-08-18 Thread Timothy Normand Miller
I've filed a bug report on this:

https://bugzilla.kernel.org/show_bug.cgi?id=103081

-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, wipe it out and start over or keep debugging?

2015-08-17 Thread Timothy Normand Miller
I'm not sure if I'm doing this wrong.  Here's what I'm seeing:

# btrfs-image -c9 -t4 -w /mnt/btrfs ~/btrfs_dump.z
Superblock bytenr is larger than device size
Open ctree failed
create failed (No such file or directory)


On Mon, Aug 17, 2015 at 7:43 AM, Austin S Hemmelgarn
ahferro...@gmail.com wrote:
 On 2015-08-15 17:46, Timothy Normand Miller wrote:

 To those of you who have been helping out with my 4-drive RAID1
 situation, is there anything further we should do to investigate this,
 in case we can uncover any more bugs, or should I just wipe everything
 out and restore from backup?

 If you need the system back online, then my suggestion would be to use
 btrfs-image to get metadata images of the disks (there's an option to clear
 out private data if need be), and then restore from backup.  That way, we
 still have the problematic images to work with and examine.




-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


So, wipe it out and start over or keep debugging?

2015-08-15 Thread Timothy Normand Miller
To those of you who have been helping out with my 4-drive RAID1
situation, is there anything further we should do to investigate this,
in case we can uncover any more bugs, or should I just wipe everything
out and restore from backup?

-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: delete missing with two missing devices doesn't delete both missing, only does a partial reconstruction

2015-08-15 Thread Timothy Normand Miller
Oh, it went read-only because it OOPSed:

[39710.419966] [ cut here ]
[39710.419969] WARNING: CPU: 1 PID: 5624 at
fs/btrfs/extent-tree.c:6226 __btrfs_free_extent+0x873/0xc80()
[39710.419970] Modules linked in: nfsd auth_rpcgss oid_registry
nfs_acl ipv6 binfmt_misc snd_hda_codec_hdmi snd_hda_codec_realtek
ppdev snd_hda_codec_generic x86_pkg_temp_thermal coretemp kvm_intel
snd_hda_intel snd_hda_controller kvm snd_hda_codec snd_hda_core
microcode snd_hwdep pcspkr snd_pcm snd_timer i2c_i801 snd lpc_ich
mfd_core parport_pc battery xts gf128mul aes_x86_64 cbc sha256_generic
libiscsi scsi_transport_iscsi tg3 ptp pps_core libphy sky2 r8169
pcnet32 mii e1000 bnx2 fuse nfs lockd grace sunrpc reiserfs multipath
linear raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx raid1 raid0 dm_snapshot dm_bufio dm_crypt dm_mirror
dm_region_hash dm_log dm_mod firewire_core hid_sunplus hid_sony
hid_samsung hid_pl hid_petalynx hid_gyration usbhid uhci_hcd
usb_storage ehci_pci
[39710.419991]  ehci_hcd aic94xx libsas qla2xxx megaraid_sas
megaraid_mbox megaraid_mm megaraid aacraid sx8 DAC960 cciss 3w_9xxx
3w_ mptsas scsi_transport_sas mptfc scsi_transport_fc mptspi
mptscsih mptbase atp870u dc395x qla1280 imm parport dmx3191d sym53c8xx
gdth advansys initio BusLogic arcmsr aic7xxx aic79xx
scsi_transport_spi sg sata_mv sata_sil24 sata_sil pata_marvell
[39710.420003] CPU: 1 PID: 5624 Comm: kworker/u8:7 Tainted: GW
  4.1.4-gentoo #1
[39710.420003] Hardware name: ECS H87H3-M/H87H3-M, BIOS 4.6.5 07/16/2013
[39710.420005] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper
[39710.420006]   8197e672 81794418

[39710.420008]  81049cbc 01846cc5e000 880064d12000
e000
[39710.420009]  fffe  8127bc03
000fc277
[39710.420010] Call Trace:
[39710.420012]  [81794418] ? dump_stack+0x40/0x50
[39710.420014]  [81049cbc] ? warn_slowpath_common+0x7c/0xb0
[39710.420015]  [8127bc03] ? __btrfs_free_extent+0x873/0xc80
[39710.420018]  [81353ef0] ? cpumask_next_and+0x30/0x50
[39710.420019]  [81075c93] ? enqueue_task_fair+0x2c3/0xdb0
[39710.420021]  [812e054c] ? btrfs_delayed_ref_lock+0x2c/0x260
[39710.420022]  [81280ffc] ? __btrfs_run_delayed_refs+0x42c/0x1280
[39710.420024]  [8113cedd] ? __sb_start_write+0x3d/0xe0
[39710.420025]  [81285f7e] ? btrfs_run_delayed_refs.part.58+0x5e/0x270
[39710.420026]  [81286228] ? delayed_ref_async_start+0x78/0x90
[39710.420028]  [812c56f3] ? normal_work_helper+0x73/0x2a0
[39710.420029]  [8105ebbc] ? process_one_work+0x13c/0x3d0
[39710.420031]  [8105eeb3] ? worker_thread+0x63/0x480
[39710.420032]  [8105ee50] ? process_one_work+0x3d0/0x3d0
[39710.420033]  [81063a5e] ? kthread+0xce/0xf0
[39710.420034]  [81063990] ? kthread_create_on_node+0x180/0x180
[39710.420036]  [8179ced2] ? ret_from_fork+0x42/0x70
[39710.420037]  [81063990] ? kthread_create_on_node+0x180/0x180
[39710.420038] ---[ end trace 0b4fe6057cd7a1a4 ]---

On Sat, Aug 15, 2015 at 9:13 AM, Timothy Normand Miller
theo...@gmail.com wrote:
 So I tried deleting the files that I think are the problem, and the
 file system went suddenly read-only, and I got this in dmesg:

 A bunch of these first messages:
 [39710.420118]  item 45 key (1668296151040 168 524288) itemoff 1557 itemsize 
 53
 [39710.420118]  extent refs 1 gen 166914 flags 1
 [39710.420119]  extent data backref root 949 objectid 440675
 offset 2621440 count 1
 [39710.420120]  item 46 key (1668296675328 168 524288) itemoff 1504 itemsize 
 53
 [39710.420120]  extent refs 1 gen 166914 flags 1
 [39710.420121]  extent data backref root 949 objectid 440675
 offset 3145728 count 1
 [39710.420121]  item 47 key (1668297199616 168 524288) itemoff 1451 itemsize 
 53
 [39710.420122]  extent refs 1 gen 166914 flags 1
 [39710.420122]  extent data backref root 949 objectid 440675
 offset 3670016 count 1
 [39710.420123]  item 48 key (1668297723904 168 524288) itemoff 1398 itemsize 
 53
 [39710.420123]  extent refs 1 gen 166914 flags 1
 [39710.420124]  extent data backref root 949 objectid 440675
 offset 4194304 count 1
 [39710.420125]  item 49 key (1668298248192 168 524288) itemoff 1345 itemsize 
 53
 [39710.420125]  extent refs 1 gen 166914 flags 1
 [39710.420126]  extent data backref root 949 objectid 440675
 offset 4718592 count 1
 [39710.420126]  item 50 key (1668298772480 168 524288) itemoff 1292 itemsize 
 53
 [39710.420127]  extent refs 1 gen 166914 flags 1
 [39710.420127]  extent data backref root 949 objectid 440675
 offset 5242880 count 1
 [39710.420128] BTRFS error (device sdc): unable to find ref byte nr
 1668272218112 parent 0 root 949  owner 1032823 offset 655360
 [39710.420129] BTRFS: error (device sdc

Re: delete missing with two missing devices doesn't delete both missing, only does a partial reconstruction

2015-08-15 Thread Timothy Normand Miller
So I tried deleting the files that I think are the problem, and the
file system went suddenly read-only, and I got this in dmesg:

A bunch of these first messages:
[39710.420118]  item 45 key (1668296151040 168 524288) itemoff 1557 itemsize 53
[39710.420118]  extent refs 1 gen 166914 flags 1
[39710.420119]  extent data backref root 949 objectid 440675
offset 2621440 count 1
[39710.420120]  item 46 key (1668296675328 168 524288) itemoff 1504 itemsize 53
[39710.420120]  extent refs 1 gen 166914 flags 1
[39710.420121]  extent data backref root 949 objectid 440675
offset 3145728 count 1
[39710.420121]  item 47 key (1668297199616 168 524288) itemoff 1451 itemsize 53
[39710.420122]  extent refs 1 gen 166914 flags 1
[39710.420122]  extent data backref root 949 objectid 440675
offset 3670016 count 1
[39710.420123]  item 48 key (1668297723904 168 524288) itemoff 1398 itemsize 53
[39710.420123]  extent refs 1 gen 166914 flags 1
[39710.420124]  extent data backref root 949 objectid 440675
offset 4194304 count 1
[39710.420125]  item 49 key (1668298248192 168 524288) itemoff 1345 itemsize 53
[39710.420125]  extent refs 1 gen 166914 flags 1
[39710.420126]  extent data backref root 949 objectid 440675
offset 4718592 count 1
[39710.420126]  item 50 key (1668298772480 168 524288) itemoff 1292 itemsize 53
[39710.420127]  extent refs 1 gen 166914 flags 1
[39710.420127]  extent data backref root 949 objectid 440675
offset 5242880 count 1
[39710.420128] BTRFS error (device sdc): unable to find ref byte nr
1668272218112 parent 0 root 949  owner 1032823 offset 655360
[39710.420129] BTRFS: error (device sdc) in __btrfs_free_extent:6232:
errno=-2 No such entry
[39710.420131] BTRFS: error (device sdc) in
btrfs_run_delayed_refs:2821: errno=-2 No such entry
[39710.431108] pending csums is 5795840

On Sat, Aug 15, 2015 at 8:51 AM, Timothy Normand Miller
theo...@gmail.com wrote:
 I didn't quite understand profile and convert, since I can't find a
 profile option.  Is this something your patch adds?

 Before I do that, however, I have to deal with this:

 compute0 ~ # btrfs device delete missing /mnt/btrfs
 ERROR: error removing the device 'missing' - Input/output error

 [13058.298763] BTRFS warning (device sdc): csum failed ino 596 off
 623218688 csum 2756583412 expected csum 4104700738
 [13058.298775] BTRFS warning (device sdc): csum failed ino 596 off
 623222784 csum 2568037276 expected csum 275151414
 [13058.298782] BTRFS warning (device sdc): csum failed ino 596 off
 623226880 csum 2227564114 expected csum 3824181799
 [13058.298788] BTRFS warning (device sdc): csum failed ino 596 off
 623230976 csum 3298529275 expected csum 1155389604
 [13058.298794] BTRFS warning (device sdc): csum failed ino 596 off
 623235072 csum 2603391790 expected csum 1861925401
 [13058.298801] BTRFS warning (device sdc): csum failed ino 596 off
 623239168 csum 2044148708 expected csum 3227559459
 [13058.298807] BTRFS warning (device sdc): csum failed ino 596 off
 623243264 csum 615351306 expected csum 2720021058
 [13058.329747] BTRFS warning (device sdc): csum failed ino 596 off
 623218688 csum 2756583412 expected csum 4104700738
 [13058.329759] BTRFS warning (device sdc): csum failed ino 596 off
 623222784 csum 2568037276 expected csum 275151414
 [13058.329770] BTRFS warning (device sdc): csum failed ino 596 off
 623226880 csum 2227564114 expected csum 3824181799

 Because of this, it won't delete the missing device.  How do I get
 past this?  I'm pretty sure the problem is in some files I want to
 delete anyhow.  Would deleting them solve the problem?

 On Sat, Aug 15, 2015 at 12:59 AM, Anand Jain anand.j...@oracle.com wrote:

 BTW, when this is all over with, how do I make sure there are really
 two copies of everything?  Will a scrub verify this?  Should I run a
 balance operation?

 pls use 'btrfs bal profile and convert' to migrate single chunk (if any
 created when there were lesser number of RW-able devices) back to your
 desired raid1. Do this when all the devices are back online. Kindly note
 there is a bug in the btrfs VM that you won't be able to bring a device
 online with out unmount - mount (I am working to fix). btrfs-progs will be
 wrong in this case don't depend too much on that.
 So to understand inside of btrfs kernel volume I generally use:
 https://patchwork.kernel.org/patch/5816011/

 In there if bdev is null it indicates device is scanned but not part of VM
 yet. Then unmount - mount will bring device back to be part of VM.

 After applying Anand's patch, I was able to mount my 4-drive RAID1
 and bring a new fourth drive online.

 However, something weird happened
 where the first delete missing only deleted one missing drive and
 only did a partial duplication.  I've posted a bug report here:

 that seems to be normal to me. unless I am missing something else / clarity.


 Thanks, Anand



 --
 Timothy Normand Miller, PhD
 Assistant

Re: delete missing with two missing devices doesn't delete both missing, only does a partial reconstruction

2015-08-15 Thread Timothy Normand Miller
I didn't quite understand profile and convert, since I can't find a
profile option.  Is this something your patch adds?

Before I do that, however, I have to deal with this:

compute0 ~ # btrfs device delete missing /mnt/btrfs
ERROR: error removing the device 'missing' - Input/output error

[13058.298763] BTRFS warning (device sdc): csum failed ino 596 off
623218688 csum 2756583412 expected csum 4104700738
[13058.298775] BTRFS warning (device sdc): csum failed ino 596 off
623222784 csum 2568037276 expected csum 275151414
[13058.298782] BTRFS warning (device sdc): csum failed ino 596 off
623226880 csum 2227564114 expected csum 3824181799
[13058.298788] BTRFS warning (device sdc): csum failed ino 596 off
623230976 csum 3298529275 expected csum 1155389604
[13058.298794] BTRFS warning (device sdc): csum failed ino 596 off
623235072 csum 2603391790 expected csum 1861925401
[13058.298801] BTRFS warning (device sdc): csum failed ino 596 off
623239168 csum 2044148708 expected csum 3227559459
[13058.298807] BTRFS warning (device sdc): csum failed ino 596 off
623243264 csum 615351306 expected csum 2720021058
[13058.329747] BTRFS warning (device sdc): csum failed ino 596 off
623218688 csum 2756583412 expected csum 4104700738
[13058.329759] BTRFS warning (device sdc): csum failed ino 596 off
623222784 csum 2568037276 expected csum 275151414
[13058.329770] BTRFS warning (device sdc): csum failed ino 596 off
623226880 csum 2227564114 expected csum 3824181799

Because of this, it won't delete the missing device.  How do I get
past this?  I'm pretty sure the problem is in some files I want to
delete anyhow.  Would deleting them solve the problem?

On Sat, Aug 15, 2015 at 12:59 AM, Anand Jain anand.j...@oracle.com wrote:

 BTW, when this is all over with, how do I make sure there are really
 two copies of everything?  Will a scrub verify this?  Should I run a
 balance operation?

 pls use 'btrfs bal profile and convert' to migrate single chunk (if any
 created when there were lesser number of RW-able devices) back to your
 desired raid1. Do this when all the devices are back online. Kindly note
 there is a bug in the btrfs VM that you won't be able to bring a device
 online with out unmount - mount (I am working to fix). btrfs-progs will be
 wrong in this case don't depend too much on that.
 So to understand inside of btrfs kernel volume I generally use:
 https://patchwork.kernel.org/patch/5816011/

 In there if bdev is null it indicates device is scanned but not part of VM
 yet. Then unmount - mount will bring device back to be part of VM.

 After applying Anand's patch, I was able to mount my 4-drive RAID1
 and bring a new fourth drive online.

 However, something weird happened
 where the first delete missing only deleted one missing drive and
 only did a partial duplication.  I've posted a bug report here:

 that seems to be normal to me. unless I am missing something else / clarity.


 Thanks, Anand



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: delete missing with two missing devices doesn't delete both missing, only does a partial reconstruction

2015-08-15 Thread Timothy Normand Miller
Here's the associated bug report with the full dmesg:

https://bugzilla.kernel.org/show_bug.cgi?id=102941

On Sat, Aug 15, 2015 at 9:13 AM, Timothy Normand Miller
theo...@gmail.com wrote:
 So I tried deleting the files that I think are the problem, and the
 file system went suddenly read-only, and I got this in dmesg:

 A bunch of these first messages:
 [39710.420118]  item 45 key (1668296151040 168 524288) itemoff 1557 itemsize 
 53
 [39710.420118]  extent refs 1 gen 166914 flags 1
 [39710.420119]  extent data backref root 949 objectid 440675
 offset 2621440 count 1
 [39710.420120]  item 46 key (1668296675328 168 524288) itemoff 1504 itemsize 
 53
 [39710.420120]  extent refs 1 gen 166914 flags 1
 [39710.420121]  extent data backref root 949 objectid 440675
 offset 3145728 count 1
 [39710.420121]  item 47 key (1668297199616 168 524288) itemoff 1451 itemsize 
 53
 [39710.420122]  extent refs 1 gen 166914 flags 1
 [39710.420122]  extent data backref root 949 objectid 440675
 offset 3670016 count 1
 [39710.420123]  item 48 key (1668297723904 168 524288) itemoff 1398 itemsize 
 53
 [39710.420123]  extent refs 1 gen 166914 flags 1
 [39710.420124]  extent data backref root 949 objectid 440675
 offset 4194304 count 1
 [39710.420125]  item 49 key (1668298248192 168 524288) itemoff 1345 itemsize 
 53
 [39710.420125]  extent refs 1 gen 166914 flags 1
 [39710.420126]  extent data backref root 949 objectid 440675
 offset 4718592 count 1
 [39710.420126]  item 50 key (1668298772480 168 524288) itemoff 1292 itemsize 
 53
 [39710.420127]  extent refs 1 gen 166914 flags 1
 [39710.420127]  extent data backref root 949 objectid 440675
 offset 5242880 count 1
 [39710.420128] BTRFS error (device sdc): unable to find ref byte nr
 1668272218112 parent 0 root 949  owner 1032823 offset 655360
 [39710.420129] BTRFS: error (device sdc) in __btrfs_free_extent:6232:
 errno=-2 No such entry
 [39710.420131] BTRFS: error (device sdc) in
 btrfs_run_delayed_refs:2821: errno=-2 No such entry
 [39710.431108] pending csums is 5795840

 On Sat, Aug 15, 2015 at 8:51 AM, Timothy Normand Miller
 theo...@gmail.com wrote:
 I didn't quite understand profile and convert, since I can't find a
 profile option.  Is this something your patch adds?

 Before I do that, however, I have to deal with this:

 compute0 ~ # btrfs device delete missing /mnt/btrfs
 ERROR: error removing the device 'missing' - Input/output error

 [13058.298763] BTRFS warning (device sdc): csum failed ino 596 off
 623218688 csum 2756583412 expected csum 4104700738
 [13058.298775] BTRFS warning (device sdc): csum failed ino 596 off
 623222784 csum 2568037276 expected csum 275151414
 [13058.298782] BTRFS warning (device sdc): csum failed ino 596 off
 623226880 csum 2227564114 expected csum 3824181799
 [13058.298788] BTRFS warning (device sdc): csum failed ino 596 off
 623230976 csum 3298529275 expected csum 1155389604
 [13058.298794] BTRFS warning (device sdc): csum failed ino 596 off
 623235072 csum 2603391790 expected csum 1861925401
 [13058.298801] BTRFS warning (device sdc): csum failed ino 596 off
 623239168 csum 2044148708 expected csum 3227559459
 [13058.298807] BTRFS warning (device sdc): csum failed ino 596 off
 623243264 csum 615351306 expected csum 2720021058
 [13058.329747] BTRFS warning (device sdc): csum failed ino 596 off
 623218688 csum 2756583412 expected csum 4104700738
 [13058.329759] BTRFS warning (device sdc): csum failed ino 596 off
 623222784 csum 2568037276 expected csum 275151414
 [13058.329770] BTRFS warning (device sdc): csum failed ino 596 off
 623226880 csum 2227564114 expected csum 3824181799

 Because of this, it won't delete the missing device.  How do I get
 past this?  I'm pretty sure the problem is in some files I want to
 delete anyhow.  Would deleting them solve the problem?

 On Sat, Aug 15, 2015 at 12:59 AM, Anand Jain anand.j...@oracle.com wrote:

 BTW, when this is all over with, how do I make sure there are really
 two copies of everything?  Will a scrub verify this?  Should I run a
 balance operation?

 pls use 'btrfs bal profile and convert' to migrate single chunk (if any
 created when there were lesser number of RW-able devices) back to your
 desired raid1. Do this when all the devices are back online. Kindly note
 there is a bug in the btrfs VM that you won't be able to bring a device
 online with out unmount - mount (I am working to fix). btrfs-progs will be
 wrong in this case don't depend too much on that.
 So to understand inside of btrfs kernel volume I generally use:
 https://patchwork.kernel.org/patch/5816011/

 In there if bdev is null it indicates device is scanned but not part of VM
 yet. Then unmount - mount will bring device back to be part of VM.

 After applying Anand's patch, I was able to mount my 4-drive RAID1
 and bring a new fourth drive online.

 However, something weird happened
 where the first delete missing only

Re: Can't mount degraded. How to remove/add drives OFFLINE?

2015-08-14 Thread Timothy Normand Miller
I'm not sure my situation is quite like the one you linked, so here's
my bug report:

https://bugzilla.kernel.org/show_bug.cgi?id=102881

On Fri, Aug 14, 2015 at 2:44 PM, Chris Murphy li...@colorremedies.com wrote:
 On Fri, Aug 14, 2015 at 12:12 PM, Timothy Normand Miller
 theo...@gmail.com wrote:
 Sorry about that empty email.  I hit a wrong key, and gmail decided to send.

 Anyhow, my replacement drive is going to arrive this evening, and I
 need to know how to add it to my btrfs array.  Here's the situation:

 - I had a drive fail, so I removed it and mounted degraded.
 - I hooked up a replacement drive, did an add on that one, and did a
 delete missing.
 - During the rebalance, the replacement drive failed, there were OOPSes, etc.
 - Now, although all of my data is there, I can't mount degraded,
 because btrfs is complaining that too many devices are missing (3 are
 there, but it sees 2 missing).

 It might be related to this (long) bug:
 https://bugzilla.kernel.org/show_bug.cgi?id=92641

 While Btrfs RAID 1 can tolerate only a single device failure, what you
 have is an in-progress rebuild of a missing device. If it becomes
 missing, the volume should be no worse off than it was before. But
 Btrfs doesn't see it this way, instead is sees this as two separate
 missing devices and now too many devices missing and it refuses to
 proceed. And there's no mechanism to remove missing devices unless you
 can mount rw. So it's stuck.


 So I could use some help with cleaning up this mess.  All the data is
 there, so I need to know how to either force it to mount degraded, or
 add and remove devices offline.  Where do I begin?

 You can try to ask on IRC. I have no ideas for this scenario, I've
 tried and failed. My case was throw away, what should still be
 possible is using btrfs restore.


 Also, doesn't it seem a bit arbitrary that there are too many
 missing, when all of the data is there?  If I understand correctly,
 all four drives in my RAID1 should all have copies of the metadata,

 No that's not correct. RAID 1 means 2 copies of metadata. In a 4
 device RAID 1 that's still only 2 copies. It is not n-way RAID 1.

 But that doesn't matter here, the problem is that Btrfs has a narrow
 idea of the volume, it assumes without context that once the number of
 devices is below the minimum, the volume can't be mounted. In reality,
 an exception exists if the failure is for an in-progress rebuild of a
 missing drive. That drive failing should mean the volume is no worse
 off than before but Btrfs doesn't know that.

 Pretty sure about that anyway.


 and of the remaining three good drives, there should be one or two
 copies of every data block.  So it's all there, but btrfs has decided,
 based on the NUMBER of missing devices, that it won't mount.
 Shouldn't it refuse to mount if it knows there is data missing?  For
 that matter, why should it even refuse in that case?  So some data
 might missing, so it should throw some errors if you try to access
 that missing data.  Right?

 I think no data is missing, no metadata is missing, and Btrfs is
 confused and stuck in this case.

 --
 Chris Murphy



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Can't mount degraded. How to remove/add drives OFFLINE?

2015-08-14 Thread Timothy Normand Miller
My

-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Can't mount degraded. How to remove/add drives OFFLINE?

2015-08-14 Thread Timothy Normand Miller
Sorry about that empty email.  I hit a wrong key, and gmail decided to send.

Anyhow, my replacement drive is going to arrive this evening, and I
need to know how to add it to my btrfs array.  Here's the situation:

- I had a drive fail, so I removed it and mounted degraded.
- I hooked up a replacement drive, did an add on that one, and did a
delete missing.
- During the rebalance, the replacement drive failed, there were OOPSes, etc.
- Now, although all of my data is there, I can't mount degraded,
because btrfs is complaining that too many devices are missing (3 are
there, but it sees 2 missing).

So I could use some help with cleaning up this mess.  All the data is
there, so I need to know how to either force it to mount degraded, or
add and remove devices offline.  Where do I begin?

Also, doesn't it seem a bit arbitrary that there are too many
missing, when all of the data is there?  If I understand correctly,
all four drives in my RAID1 should all have copies of the metadata,
and of the remaining three good drives, there should be one or two
copies of every data block.  So it's all there, but btrfs has decided,
based on the NUMBER of missing devices, that it won't mount.
Shouldn't it refuse to mount if it knows there is data missing?  For
that matter, why should it even refuse in that case?  So some data
might missing, so it should throw some errors if you try to access
that missing data.  Right?

Thanks!

-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can't mount degraded. How to remove/add drives OFFLINE?

2015-08-14 Thread Timothy Normand Miller
On Fri, Aug 14, 2015 at 7:49 PM, Anand Jain anand.j...@oracle.com wrote:



 - I had a drive fail, so I removed it and mounted degraded.


 that bit dangerous to do without the below patch. patch has more details
 why.

Just to be clear, I removed the drive (the original failed drive) when
the power was off, then powered up, and then mounted degraded.  That's
not dangerous that I know of.


 - I hooked up a replacement drive, did an add on that one, and did a
 delete missing.
 - During the rebalance, the replacement drive failed, there were OOPSes,
 etc.
 - Now, although all of my data is there, I can't mount degraded,
 because btrfs is complaining that too many devices are missing (3 are
 there, but it sees 2 missing).



 This is addressed in the patch

   [PATCH 23/23] Btrfs: allow -o rw,degraded for single group profile


Where is this patch, and what kernel versions can this be applied to?



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can't mount degraded. How to remove/add drives OFFLINE?

2015-08-14 Thread Timothy Normand Miller
I applied that patch to my 4.1.4, it mounted degraded, and now it's
balancing to the new drive.

Thanks for all the help!

On Fri, Aug 14, 2015 at 8:28 PM, Anand Jain anand.j...@oracle.com wrote:


 Just to be clear, I removed the drive (the original failed drive) when
 the power was off, then powered up, and then mounted degraded.  That's
 not dangerous that I know of.


 patch has details. pls refer.


 Where is this patch, and what kernel versions can this be applied to?



 https://patchwork.kernel.org/patch/7014141/

 its on 4.3. but should apply nice on below.

 thanks
 Anand



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: delete missing with two missing devices doesn't delete both missing, only does a partial reconstruction

2015-08-14 Thread Timothy Normand Miller
BTW, when this is all over with, how do I make sure there are really
two copies of everything?  Will a scrub verify this?  Should I run a
balance operation?

On Fri, Aug 14, 2015 at 11:29 PM, Timothy Normand Miller
theo...@gmail.com wrote:
 After applying Anand's patch, I was able to mount my 4-drive RAID1 and
 bring a new fourth drive online.  However, something weird happened
 where the first delete missing only deleted one missing drive and
 only did a partial duplication.  I've posted a bug report here:

 https://bugzilla.kernel.org/show_bug.cgi?id=102901

 --
 Timothy Normand Miller, PhD
 Assistant Professor of Computer Science, Binghamton University
 http://www.cs.binghamton.edu/~millerti/
 Open Graphics Project



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


delete missing with two missing devices doesn't delete both missing, only does a partial reconstruction

2015-08-14 Thread Timothy Normand Miller
After applying Anand's patch, I was able to mount my 4-drive RAID1 and
bring a new fourth drive online.  However, something weird happened
where the first delete missing only deleted one missing drive and
only did a partial duplication.  I've posted a bug report here:

https://bugzilla.kernel.org/show_bug.cgi?id=102901

-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-12 Thread Timothy Normand Miller
Actually, it didn't resume.  The btrfs delete missing was using 100%
of the I/O bandwidth but wasn't actually doing any disk reads of
writes.  I tried to reboot, but the system wouldn't go down, so after
waiting 10 minutes, I power-cycled.  Now I can't mount at all and
here's what dmesg says about that:

[  236.118419] BTRFS info (device sdb): allowing degraded mounts
[  236.118421] BTRFS info (device sdb): disk space caching is enabled
[  236.165470] BTRFS: bdev (null) errs: wr 1724, rd 305, flush 45,
corrupt 0, gen 2
[  245.883595] BTRFS: too many missing devices, writeable mount is not allowed
[  245.946570] BTRFS: open_ctree failed

It thinks now that there should be five devices, and since there are
only three available, it won't let me mount.

# btrfs filesystem show
Label: none  uuid: 49ac9ad2-b529-4e6e-aef9-1c5b9e8a72f8
Total devices 1 FS bytes used 28.26GiB
devid1 size 79.69GiB used 41.03GiB path /dev/sda3

warning, device 1 is missing
warning, device 1 is missing
warning devid 1 not found already
warning devid 5 not found already
Label: none  uuid: ecdff84d-b4a2-4286-a1c1-cd7e5396901c
Total devices 5 FS bytes used 1.46TiB
devid2 size 931.51GiB used 767.00GiB path /dev/sdd
devid3 size 931.51GiB used 745.03GiB path /dev/sdc
devid4 size 931.51GiB used 767.00GiB path /dev/sdb
*** Some devices missing

btrfs-progs v4.1.2



On Wed, Aug 12, 2015 at 4:27 PM, Timothy Normand Miller
theo...@gmail.com wrote:
 It resumed on its own.  Weird.

 On Wed, Aug 12, 2015 at 4:23 PM, Timothy Normand Miller
 theo...@gmail.com wrote:
 On Wed, Aug 12, 2015 at 2:10 PM, Chris Murphy li...@colorremedies.com 
 wrote:


 Anyway it looks like it's hardware related, but I don't know what
 device ata4.00 is, so maybe this helps:
 http://superuser.com/questions/617192/mapping-ata-device-number-to-logical-device-name

 # ata=4; ls -l /sys/block/sd* | grep $(grep $ata
 /sys/class/scsi_host/host*/unique_id | awk -F'/' '{print $5}')
 lrwxrwxrwx 1 root root 0 Aug 12 16:21 /sys/block/sde -
 ../devices/pci:00/:00:1f.5/ata4/host3/target3:0:0/3:0:0:0/block/sde

 sde is the newly attached drive, replacing the one that had appeared
 to have bad sectors.  So it looks like either this new motherboard has
 a bad connector, or the cable is bad.  I'm going to swap it out for a
 different SATA cable.  How do I resume the failed operation?  And
 should I reboot because of the OOPSes?

 --
 Timothy Normand Miller, PhD
 Assistant Professor of Computer Science, Binghamton University
 http://www.cs.binghamton.edu/~millerti/
 Open Graphics Project



 --
 Timothy Normand Miller, PhD
 Assistant Professor of Computer Science, Binghamton University
 http://www.cs.binghamton.edu/~millerti/
 Open Graphics Project



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-12 Thread Timothy Normand Miller
On Wed, Aug 12, 2015 at 2:10 PM, Chris Murphy li...@colorremedies.com wrote:


 Anyway it looks like it's hardware related, but I don't know what
 device ata4.00 is, so maybe this helps:
 http://superuser.com/questions/617192/mapping-ata-device-number-to-logical-device-name

# ata=4; ls -l /sys/block/sd* | grep $(grep $ata
/sys/class/scsi_host/host*/unique_id | awk -F'/' '{print $5}')
lrwxrwxrwx 1 root root 0 Aug 12 16:21 /sys/block/sde -
../devices/pci:00/:00:1f.5/ata4/host3/target3:0:0/3:0:0:0/block/sde

sde is the newly attached drive, replacing the one that had appeared
to have bad sectors.  So it looks like either this new motherboard has
a bad connector, or the cable is bad.  I'm going to swap it out for a
different SATA cable.  How do I resume the failed operation?  And
should I reboot because of the OOPSes?

-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-12 Thread Timothy Normand Miller
Ok, here's what's happening.  A few years ago, I took my old WD green
drives and put them in a box as backups to a new array of Seagate
drives.  When one of those seagate drives failed (just out of
warranty, of course), I replaced it with one of the WD's.  That was
cooking along just fine until just a few days ago when it started
throwing bad sectors and for some reason caused btrfs to have lots of
problems with the system block on the other three drives.  I tried to
add the other spare and remove the old spare, but for whatever reason,
this second spare (which had been fine when I boxed it in an
anti-static bag), is now failing catastrophically.  Now that that has
happened, the btrfs volume is stuck in a funny state where it won't
mount in degraded mode, because it thinks there should be five
devices, but there are only the original three.

I'm going to go ahead and order a new drive.  Meanwhile, is there a
way to add and remove drives from volumes that can't be mounted?


On Wed, Aug 12, 2015 at 4:48 PM, Timothy Normand Miller
theo...@gmail.com wrote:
 Actually, it didn't resume.  The btrfs delete missing was using 100%
 of the I/O bandwidth but wasn't actually doing any disk reads of
 writes.  I tried to reboot, but the system wouldn't go down, so after
 waiting 10 minutes, I power-cycled.  Now I can't mount at all and
 here's what dmesg says about that:

 [  236.118419] BTRFS info (device sdb): allowing degraded mounts
 [  236.118421] BTRFS info (device sdb): disk space caching is enabled
 [  236.165470] BTRFS: bdev (null) errs: wr 1724, rd 305, flush 45,
 corrupt 0, gen 2
 [  245.883595] BTRFS: too many missing devices, writeable mount is not allowed
 [  245.946570] BTRFS: open_ctree failed

 It thinks now that there should be five devices, and since there are
 only three available, it won't let me mount.

 # btrfs filesystem show
 Label: none  uuid: 49ac9ad2-b529-4e6e-aef9-1c5b9e8a72f8
 Total devices 1 FS bytes used 28.26GiB
 devid1 size 79.69GiB used 41.03GiB path /dev/sda3

 warning, device 1 is missing
 warning, device 1 is missing
 warning devid 1 not found already
 warning devid 5 not found already
 Label: none  uuid: ecdff84d-b4a2-4286-a1c1-cd7e5396901c
 Total devices 5 FS bytes used 1.46TiB
 devid2 size 931.51GiB used 767.00GiB path /dev/sdd
 devid3 size 931.51GiB used 745.03GiB path /dev/sdc
 devid4 size 931.51GiB used 767.00GiB path /dev/sdb
 *** Some devices missing

 btrfs-progs v4.1.2



 On Wed, Aug 12, 2015 at 4:27 PM, Timothy Normand Miller
 theo...@gmail.com wrote:
 It resumed on its own.  Weird.

 On Wed, Aug 12, 2015 at 4:23 PM, Timothy Normand Miller
 theo...@gmail.com wrote:
 On Wed, Aug 12, 2015 at 2:10 PM, Chris Murphy li...@colorremedies.com 
 wrote:


 Anyway it looks like it's hardware related, but I don't know what
 device ata4.00 is, so maybe this helps:
 http://superuser.com/questions/617192/mapping-ata-device-number-to-logical-device-name

 # ata=4; ls -l /sys/block/sd* | grep $(grep $ata
 /sys/class/scsi_host/host*/unique_id | awk -F'/' '{print $5}')
 lrwxrwxrwx 1 root root 0 Aug 12 16:21 /sys/block/sde -
 ../devices/pci:00/:00:1f.5/ata4/host3/target3:0:0/3:0:0:0/block/sde

 sde is the newly attached drive, replacing the one that had appeared
 to have bad sectors.  So it looks like either this new motherboard has
 a bad connector, or the cable is bad.  I'm going to swap it out for a
 different SATA cable.  How do I resume the failed operation?  And
 should I reboot because of the OOPSes?

 --
 Timothy Normand Miller, PhD
 Assistant Professor of Computer Science, Binghamton University
 http://www.cs.binghamton.edu/~millerti/
 Open Graphics Project



 --
 Timothy Normand Miller, PhD
 Assistant Professor of Computer Science, Binghamton University
 http://www.cs.binghamton.edu/~millerti/
 Open Graphics Project



 --
 Timothy Normand Miller, PhD
 Assistant Professor of Computer Science, Binghamton University
 http://www.cs.binghamton.edu/~millerti/
 Open Graphics Project



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-12 Thread Timothy Normand Miller
It resumed on its own.  Weird.

On Wed, Aug 12, 2015 at 4:23 PM, Timothy Normand Miller
theo...@gmail.com wrote:
 On Wed, Aug 12, 2015 at 2:10 PM, Chris Murphy li...@colorremedies.com wrote:


 Anyway it looks like it's hardware related, but I don't know what
 device ata4.00 is, so maybe this helps:
 http://superuser.com/questions/617192/mapping-ata-device-number-to-logical-device-name

 # ata=4; ls -l /sys/block/sd* | grep $(grep $ata
 /sys/class/scsi_host/host*/unique_id | awk -F'/' '{print $5}')
 lrwxrwxrwx 1 root root 0 Aug 12 16:21 /sys/block/sde -
 ../devices/pci:00/:00:1f.5/ata4/host3/target3:0:0/3:0:0:0/block/sde

 sde is the newly attached drive, replacing the one that had appeared
 to have bad sectors.  So it looks like either this new motherboard has
 a bad connector, or the cable is bad.  I'm going to swap it out for a
 different SATA cable.  How do I resume the failed operation?  And
 should I reboot because of the OOPSes?

 --
 Timothy Normand Miller, PhD
 Assistant Professor of Computer Science, Binghamton University
 http://www.cs.binghamton.edu/~millerti/
 Open Graphics Project



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-12 Thread Timothy Normand Miller
I added a new device and then did a delete missing.  I lost the
terminal (should have used gnu screen), so I didn't see the stdout,
but the operation aborted at some point.  There's ton of output in
dmesg related to this, along with some OOPSes, which I have attached
as dmesg2 here:

https://bugzilla.kernel.org/show_bug.cgi?id=102691


-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-11 Thread Timothy Normand Miller
On Tue, Aug 11, 2015 at 5:24 PM, Chris Murphy li...@colorremedies.com wrote:

 There is still data redundancy.  Will a scrub at least notice that the
 copies differ?

 No, that's what I mean by nodatasum means no raid1 self-healing is
 possible. You have data redundancy, but without checksums btrfs has
 no way to know if they differ. It doesn't do two reads and compares
 them, it's just like md raid, it picks one device, and so long as
 there's no read error from the device, that copy of the data is
 assumed to be good.

Ok, that makes sense.  I'm guessing it wouldn't be worth it to add a
feature like this because (a) few people use nodatacow or end up in my
situation, and (b) if they did, and the two copies were inconsistent,
what would you do?  I suppose for me, it would be nice to know which
files were affected.


-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-11 Thread Timothy Normand Miller
On Tue, Aug 11, 2015 at 4:48 PM, Chris Murphy li...@colorremedies.com wrote:


 The compress is ignored, and it looks like nodatasum and nodatacow
 apply to everything. The nodatasum means no raid1 self-healing is
 possible for any data on the entire volume. Metadata checksumming is
 still enabled.

Ugh.  So I need to change my fstab file.  I swear, some expert on IRC
told me that this should work fine, which is why I did it.  In fact, I
think they recommended it on the basis that I wanted to put VM images
on one of the subvolumes.  This discussion occurred a long time ago,
well before RAID5 was even partially implemented.

There is still data redundancy.  Will a scrub at least notice that the
copies differ?


-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-11 Thread Timothy Normand Miller
On Tue, Aug 11, 2015 at 12:21 AM, Chris Murphy li...@colorremedies.com wrote:
 On Mon, Aug 10, 2015 at 7:23 PM, Timothy Normand Miller
 theo...@gmail.com wrote:
 On Mon, Aug 10, 2015 at 6:52 PM, Chris Murphy li...@colorremedies.com 
 wrote:

 - complete dmesg for the failed mount

 It really doesn't say much.  I have things like this:
 [8.643535] BTRFS info (device sdc): disk space caching is enabled
 [8.643789] BTRFS: failed to read the system array on sdc
 [8.706062] BTRFS: open_ctree failed
 [8.707124] BTRFS info (device sdc): disk space caching is enabled
 [8.710924] BTRFS: failed to read the system array on sdc
 [8.766080] BTRFS: open_ctree failed
 [8.766903] BTRFS info (device sdc): setting nodatacow, compression 
 disabled
 [8.766905] BTRFS info (device sdc): disk space caching is enabled
 [8.767152] BTRFS: failed to read the system array on sdc
 [8.936019] BTRFS: open_ctree failed
 [8.936906] BTRFS info (device sdc): disk space caching is enabled
 [8.939922] BTRFS: failed to read the system array on sdc
 [8.995984] BTRFS: open_ctree failed
 [8.996796] BTRFS info (device sdc): disk space caching is enabled
 [8.997093] BTRFS: failed to read the system array on sdc
 [9.125936] BTRFS: open_ctree failed

 It looks like there's not enough redundancy remaining to mount and in
 such a case there's really not much to be done.

 I don't see nodatacow in your fstab, so I don't know why that's
 happening. That means no checksumming for data.

Sorry.  I was dumb.  I only showed you the entry for what I was trying
to mount manually.  I have subvolumes, and this is what is in my
fstab:

UUID=ecdff84d-b4a2-4286-a1c1-cd7e5396901c /home btrfs
compress=lzo,noatime,space_cache,subvol=home 0 2
UUID=ecdff84d-b4a2-4286-a1c1-cd7e5396901c /mnt/btrfs btrfs
compress=lzo,noatime,space_cache 0 2
UUID=ecdff84d-b4a2-4286-a1c1-cd7e5396901c /mnt/vms btrfs
noatime,nodatacow,space_cache,subvol=vms 0 2
UUID=ecdff84d-b4a2-4286-a1c1-cd7e5396901c /mnt/oldfiles btrfs
compress=lzo,noatime,space_cache,subvol=oldfiles 0 2
UUID=ecdff84d-b4a2-4286-a1c1-cd7e5396901c /mnt/backup btrfs
compress=lzo,noatime,space_cache,subvol=backup 0 2





 Also, when I manually try to mount, I get things like this:

 # mount /mnt/btrfs
 mount: wrong fs type, bad option, bad superblock on /dev/sdc,
missing codepage or helper program, or other error

 Have you tried to mount with -o degraded?

Ooh!  I can do that!

Mounting ro,degraded, I see this:

[94197.902443] BTRFS info (device sdc): allowing degraded mounts
[94197.902448] BTRFS info (device sdc): disk space caching is enabled
[94198.240621] BTRFS: bdev (null) errs: wr 1724, rd 305, flush 45,
corrupt 0, gen 2

Mounting rw,degraded, I see this:

[94312.091613] BTRFS info (device sdc): allowing degraded mounts
[94312.091618] BTRFS info (device sdc): disk space caching is enabled
[94312.194513] BTRFS: bdev (null) errs: wr 1724, rd 305, flush 45,
corrupt 0, gen 2
[94319.824563] BTRFS: checking UUID tree





 Well, if I get something lengthy, I'll attach it to my bug report.
 Did the information I reported help at all?

 The entire dmesg is still useful because it should show libata errors
 if these aren't fully failed drives. So you should file a bug and
 include, literally, the entire unedited dmesg.

Alright, I'll do that.  Thanks!



 --
 Chris Murphy
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-11 Thread Timothy Normand Miller
On Tue, Aug 11, 2015 at 1:56 PM, Timothy Normand Miller
theo...@gmail.com wrote:
 On Tue, Aug 11, 2015 at 12:21 AM, Chris Murphy li...@colorremedies.com 
 wrote:

 The entire dmesg is still useful because it should show libata errors
 if these aren't fully failed drives. So you should file a bug and
 include, literally, the entire unedited dmesg.

 Alright, I'll do that.  Thanks!


Here you go:

https://bugzilla.kernel.org/show_bug.cgi?id=102691

-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-11 Thread Timothy Normand Miller
On Tue, Aug 11, 2015 at 3:57 PM, Chris Murphy li...@colorremedies.com wrote:
 On Tue, Aug 11, 2015 at 12:04 PM, Timothy Normand Miller
 theo...@gmail.com wrote:

 https://bugzilla.kernel.org/show_bug.cgi?id=102691

 [7.729124] BTRFS: device fsid ecdff84d-b4a2-4286-a1c1-cd7e5396901c
 devid 2 transid 226237 /dev/sdd
 [7.746115] BTRFS: device fsid ecdff84d-b4a2-4286-a1c1-cd7e5396901c
 devid 4 transid 226237 /dev/sdb
 [7.826493] BTRFS: device fsid ecdff84d-b4a2-4286-a1c1-cd7e5396901c
 devid 3 transid 226237 /dev/sdc

 What do you get for 'btrfs fi show'

# btrfs fi show
Label: none  uuid: 49ac9ad2-b529-4e6e-aef9-1c5b9e8a72f8
Total devices 1 FS bytes used 28.33GiB
devid1 size 79.69GiB used 41.03GiB path /dev/sda3

Label: none  uuid: ecdff84d-b4a2-4286-a1c1-cd7e5396901c
Total devices 4 FS bytes used 1.46TiB
devid2 size 931.51GiB used 767.00GiB path /dev/sdd
devid3 size 931.51GiB used 760.03GiB path /dev/sdc
devid4 size 931.51GiB used 767.00GiB path /dev/sdb
*** Some devices missing

Label: none  uuid: f9331766-e50a-43d5-98dc-fabf5c68321d
Total devices 1 FS bytes used 2.99TiB
devid1 size 3.64TiB used 3.01TiB path /dev/sde1

btrfs-progs v4.1.2


 I see devid 2, 3, 4 only for this volume UUID. So you definitely
 appear to have a failed device and that's why it doesn't mount
 automatically at boot time. You just need to use -o degraded, and that
 should work assuming no problems with the other three devices. If it
 does work, 'btrfs replace start...' is the ideal way to replace the
 failed drive.

It's missing because I physically disconnected it.  Someone on IRC
suggested I try this in case the drive with the bad sector was
interfering.  Of course, now that I've done this and mounted
read/write, we can't reintegrate the failing drive.

If I lose the array, I won't cry.  The backup appears to be complete.
But it would be convenient to avoid having to restore from scratch,
and I'm hoping this might help you guys too in some way.  I really
like btrfs, and I would like provide you with whatever info might
contribute something.


 Maybe someone else can say whether nodatacow as a subvolume mount
 option will apply this to the entire volume.

At the moment, I'm only trying to mount the whole volume, just so I
could recover and scrub it, although as I mentioned in my earlier
email, the scrub aborts with no report of why and with 0 errors.



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-10 Thread Timothy Normand Miller
Hi, everyone,

I have a four-drive RAID1 array, and since yesterday, some problem has
rendered it unmountable (read/write anyhow).  One drive reports a read
error, so maybe the drive is failing, but I've had that happen before,
and it was easy to swap in a new drive.  This time, two more drives
are reporting that they failed to read the system array.  I managed
to mount it read-only (by specifying the node of the fourth drive) and
rsync everything to a backup drive.  Now I'd like to try to repair.
This is where I'm running into problems.  Since I can't mount it
read-write, I can't do a scrub, so I tried btrfs check --repair, and
this is what I got:

# btrfs check --repair /dev/sde
enabling repair mode
Checking filesystem on /dev/sde
UUID: ecdff84d-b4a2-4286-a1c1-cd7e5396901c
checking extents
ref mismatch on [1667931533312 524288] extent item 1, found 2
attempting to repair backref discrepency for bytenr 1667931533312
Ref doesn't match the record start and is compressed, please take a
btrfs-image of this file system and send it to a btrfs developer so
they can complete this functionality for bytenr 1667931639808
failed to repair damaged filesystem, aborting

Since this specifically told me to contact a developer, I figured this
is something you guys want to know about.  :)

Also, I was wondering if perhaps someone can help me figure out how to
repair it.

There are only two files that appear to be unrecoverable when I rsync,
and I can restore those from an earlier backup.  Since I can't mount
read/write, I can't go and delete those files, so I seem to be stuck.



BTRFS works beautifully with single drive configurations.  I have
multiple, and I've never had a problem.  On the other hand seem to
have LOTS of trouble with 4-drive RAID1.  I get OOPSes regularly.
I've tried reporting them on bugzilla.kernel.org, but it doesn't
appear that btrfs devs actually use that.  Is this list a better place
to report those?


Thanks for the help!

-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-10 Thread Timothy Normand Miller
On Mon, Aug 10, 2015 at 6:52 PM, Chris Murphy li...@colorremedies.com wrote:
 Four needed things:
 - kernel version

4.1.0-gentoo-r1, although I have also tried 4.1.4.

 - btrfs-progs version

4.1.2

 - complete dmesg for the failed mount

It really doesn't say much.  I have things like this:
[8.643535] BTRFS info (device sdc): disk space caching is enabled
[8.643789] BTRFS: failed to read the system array on sdc
[8.706062] BTRFS: open_ctree failed
[8.707124] BTRFS info (device sdc): disk space caching is enabled
[8.710924] BTRFS: failed to read the system array on sdc
[8.766080] BTRFS: open_ctree failed
[8.766903] BTRFS info (device sdc): setting nodatacow, compression disabled
[8.766905] BTRFS info (device sdc): disk space caching is enabled
[8.767152] BTRFS: failed to read the system array on sdc
[8.936019] BTRFS: open_ctree failed
[8.936906] BTRFS info (device sdc): disk space caching is enabled
[8.939922] BTRFS: failed to read the system array on sdc
[8.995984] BTRFS: open_ctree failed
[8.996796] BTRFS info (device sdc): disk space caching is enabled
[8.997093] BTRFS: failed to read the system array on sdc
[9.125936] BTRFS: open_ctree failed

Also, when I manually try to mount, I get things like this:

# mount /mnt/btrfs
mount: wrong fs type, bad option, bad superblock on /dev/sdc,
   missing codepage or helper program, or other error

   In some cases useful info is found in syslog - try
   dmesg | tail or so.

For this fstab entry:
UUID=ecdff84d-b4a2-4286-a1c1-cd7e5396901c /mnt/btrfs btrfs
compress=lzo,noatime,space_cache 0 2

# mount -t btrfs /dev/sdd /mnt/btrfs
mount: wrong fs type, bad option, bad superblock on /dev/sdd,
   missing codepage or helper program, or other error

   In some cases useful info is found in syslog - try
   dmesg | tail or so.


 - complete btrfs check output (you mostly have this but since the
 version isn't included, it's not clear this is the entire output)

I pasted it all.


 The last two can be included as attachments in a bugzilla.kernel.org
 bug report and the URL posted in this thread. Typically MUA wrapping
 nerfs the dmesg making it hard to read, so attachments to a bug report
 are better.

Well, if I get something lengthy, I'll attach it to my bug report.
Did the information I reported help at all?  I think that btrfs just
isn't being informative about the problem.  Are there other commands I
can run to get more detailed reports?

BTW, I tried disconnecting the drive with the bad sector.  I still get
all the same errors and can't repair.


 Bugs get reported both in bugzilla and on the list.
 https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#How_do_I_report_bugs_and_issues.3F

 Sometimes it takes a while for devs to respond, they also get worked
 on even without responses just because there's so many improvements
 each release.


 --
 Chris Murphy
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html