Re: delete missing with two missing devices doesn't delete both missing, only does a partial reconstruction

2015-08-15 Thread Timothy Normand Miller
Oh, it went read-only because it OOPSed:

[39710.419966] [ cut here ]
[39710.419969] WARNING: CPU: 1 PID: 5624 at
fs/btrfs/extent-tree.c:6226 __btrfs_free_extent+0x873/0xc80()
[39710.419970] Modules linked in: nfsd auth_rpcgss oid_registry
nfs_acl ipv6 binfmt_misc snd_hda_codec_hdmi snd_hda_codec_realtek
ppdev snd_hda_codec_generic x86_pkg_temp_thermal coretemp kvm_intel
snd_hda_intel snd_hda_controller kvm snd_hda_codec snd_hda_core
microcode snd_hwdep pcspkr snd_pcm snd_timer i2c_i801 snd lpc_ich
mfd_core parport_pc battery xts gf128mul aes_x86_64 cbc sha256_generic
libiscsi scsi_transport_iscsi tg3 ptp pps_core libphy sky2 r8169
pcnet32 mii e1000 bnx2 fuse nfs lockd grace sunrpc reiserfs multipath
linear raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx raid1 raid0 dm_snapshot dm_bufio dm_crypt dm_mirror
dm_region_hash dm_log dm_mod firewire_core hid_sunplus hid_sony
hid_samsung hid_pl hid_petalynx hid_gyration usbhid uhci_hcd
usb_storage ehci_pci
[39710.419991]  ehci_hcd aic94xx libsas qla2xxx megaraid_sas
megaraid_mbox megaraid_mm megaraid aacraid sx8 DAC960 cciss 3w_9xxx
3w_ mptsas scsi_transport_sas mptfc scsi_transport_fc mptspi
mptscsih mptbase atp870u dc395x qla1280 imm parport dmx3191d sym53c8xx
gdth advansys initio BusLogic arcmsr aic7xxx aic79xx
scsi_transport_spi sg sata_mv sata_sil24 sata_sil pata_marvell
[39710.420003] CPU: 1 PID: 5624 Comm: kworker/u8:7 Tainted: GW
  4.1.4-gentoo #1
[39710.420003] Hardware name: ECS H87H3-M/H87H3-M, BIOS 4.6.5 07/16/2013
[39710.420005] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper
[39710.420006]   8197e672 81794418

[39710.420008]  81049cbc 01846cc5e000 880064d12000
e000
[39710.420009]  fffe  8127bc03
000fc277
[39710.420010] Call Trace:
[39710.420012]  [81794418] ? dump_stack+0x40/0x50
[39710.420014]  [81049cbc] ? warn_slowpath_common+0x7c/0xb0
[39710.420015]  [8127bc03] ? __btrfs_free_extent+0x873/0xc80
[39710.420018]  [81353ef0] ? cpumask_next_and+0x30/0x50
[39710.420019]  [81075c93] ? enqueue_task_fair+0x2c3/0xdb0
[39710.420021]  [812e054c] ? btrfs_delayed_ref_lock+0x2c/0x260
[39710.420022]  [81280ffc] ? __btrfs_run_delayed_refs+0x42c/0x1280
[39710.420024]  [8113cedd] ? __sb_start_write+0x3d/0xe0
[39710.420025]  [81285f7e] ? btrfs_run_delayed_refs.part.58+0x5e/0x270
[39710.420026]  [81286228] ? delayed_ref_async_start+0x78/0x90
[39710.420028]  [812c56f3] ? normal_work_helper+0x73/0x2a0
[39710.420029]  [8105ebbc] ? process_one_work+0x13c/0x3d0
[39710.420031]  [8105eeb3] ? worker_thread+0x63/0x480
[39710.420032]  [8105ee50] ? process_one_work+0x3d0/0x3d0
[39710.420033]  [81063a5e] ? kthread+0xce/0xf0
[39710.420034]  [81063990] ? kthread_create_on_node+0x180/0x180
[39710.420036]  [8179ced2] ? ret_from_fork+0x42/0x70
[39710.420037]  [81063990] ? kthread_create_on_node+0x180/0x180
[39710.420038] ---[ end trace 0b4fe6057cd7a1a4 ]---

On Sat, Aug 15, 2015 at 9:13 AM, Timothy Normand Miller
theo...@gmail.com wrote:
 So I tried deleting the files that I think are the problem, and the
 file system went suddenly read-only, and I got this in dmesg:

 A bunch of these first messages:
 [39710.420118]  item 45 key (1668296151040 168 524288) itemoff 1557 itemsize 
 53
 [39710.420118]  extent refs 1 gen 166914 flags 1
 [39710.420119]  extent data backref root 949 objectid 440675
 offset 2621440 count 1
 [39710.420120]  item 46 key (1668296675328 168 524288) itemoff 1504 itemsize 
 53
 [39710.420120]  extent refs 1 gen 166914 flags 1
 [39710.420121]  extent data backref root 949 objectid 440675
 offset 3145728 count 1
 [39710.420121]  item 47 key (1668297199616 168 524288) itemoff 1451 itemsize 
 53
 [39710.420122]  extent refs 1 gen 166914 flags 1
 [39710.420122]  extent data backref root 949 objectid 440675
 offset 3670016 count 1
 [39710.420123]  item 48 key (1668297723904 168 524288) itemoff 1398 itemsize 
 53
 [39710.420123]  extent refs 1 gen 166914 flags 1
 [39710.420124]  extent data backref root 949 objectid 440675
 offset 4194304 count 1
 [39710.420125]  item 49 key (1668298248192 168 524288) itemoff 1345 itemsize 
 53
 [39710.420125]  extent refs 1 gen 166914 flags 1
 [39710.420126]  extent data backref root 949 objectid 440675
 offset 4718592 count 1
 [39710.420126]  item 50 key (1668298772480 168 524288) itemoff 1292 itemsize 
 53
 [39710.420127]  extent refs 1 gen 166914 flags 1
 [39710.420127]  extent data backref root 949 objectid 440675
 offset 5242880 count 1
 [39710.420128] BTRFS error (device sdc): unable to find ref byte nr
 1668272218112 parent 0 root 949  owner 1032823 offset 655360
 [39710.420129] BTRFS: error (device sdc) 

Re: delete missing with two missing devices doesn't delete both missing, only does a partial reconstruction

2015-08-15 Thread Timothy Normand Miller
So I tried deleting the files that I think are the problem, and the
file system went suddenly read-only, and I got this in dmesg:

A bunch of these first messages:
[39710.420118]  item 45 key (1668296151040 168 524288) itemoff 1557 itemsize 53
[39710.420118]  extent refs 1 gen 166914 flags 1
[39710.420119]  extent data backref root 949 objectid 440675
offset 2621440 count 1
[39710.420120]  item 46 key (1668296675328 168 524288) itemoff 1504 itemsize 53
[39710.420120]  extent refs 1 gen 166914 flags 1
[39710.420121]  extent data backref root 949 objectid 440675
offset 3145728 count 1
[39710.420121]  item 47 key (1668297199616 168 524288) itemoff 1451 itemsize 53
[39710.420122]  extent refs 1 gen 166914 flags 1
[39710.420122]  extent data backref root 949 objectid 440675
offset 3670016 count 1
[39710.420123]  item 48 key (1668297723904 168 524288) itemoff 1398 itemsize 53
[39710.420123]  extent refs 1 gen 166914 flags 1
[39710.420124]  extent data backref root 949 objectid 440675
offset 4194304 count 1
[39710.420125]  item 49 key (1668298248192 168 524288) itemoff 1345 itemsize 53
[39710.420125]  extent refs 1 gen 166914 flags 1
[39710.420126]  extent data backref root 949 objectid 440675
offset 4718592 count 1
[39710.420126]  item 50 key (1668298772480 168 524288) itemoff 1292 itemsize 53
[39710.420127]  extent refs 1 gen 166914 flags 1
[39710.420127]  extent data backref root 949 objectid 440675
offset 5242880 count 1
[39710.420128] BTRFS error (device sdc): unable to find ref byte nr
1668272218112 parent 0 root 949  owner 1032823 offset 655360
[39710.420129] BTRFS: error (device sdc) in __btrfs_free_extent:6232:
errno=-2 No such entry
[39710.420131] BTRFS: error (device sdc) in
btrfs_run_delayed_refs:2821: errno=-2 No such entry
[39710.431108] pending csums is 5795840

On Sat, Aug 15, 2015 at 8:51 AM, Timothy Normand Miller
theo...@gmail.com wrote:
 I didn't quite understand profile and convert, since I can't find a
 profile option.  Is this something your patch adds?

 Before I do that, however, I have to deal with this:

 compute0 ~ # btrfs device delete missing /mnt/btrfs
 ERROR: error removing the device 'missing' - Input/output error

 [13058.298763] BTRFS warning (device sdc): csum failed ino 596 off
 623218688 csum 2756583412 expected csum 4104700738
 [13058.298775] BTRFS warning (device sdc): csum failed ino 596 off
 623222784 csum 2568037276 expected csum 275151414
 [13058.298782] BTRFS warning (device sdc): csum failed ino 596 off
 623226880 csum 2227564114 expected csum 3824181799
 [13058.298788] BTRFS warning (device sdc): csum failed ino 596 off
 623230976 csum 3298529275 expected csum 1155389604
 [13058.298794] BTRFS warning (device sdc): csum failed ino 596 off
 623235072 csum 2603391790 expected csum 1861925401
 [13058.298801] BTRFS warning (device sdc): csum failed ino 596 off
 623239168 csum 2044148708 expected csum 3227559459
 [13058.298807] BTRFS warning (device sdc): csum failed ino 596 off
 623243264 csum 615351306 expected csum 2720021058
 [13058.329747] BTRFS warning (device sdc): csum failed ino 596 off
 623218688 csum 2756583412 expected csum 4104700738
 [13058.329759] BTRFS warning (device sdc): csum failed ino 596 off
 623222784 csum 2568037276 expected csum 275151414
 [13058.329770] BTRFS warning (device sdc): csum failed ino 596 off
 623226880 csum 2227564114 expected csum 3824181799

 Because of this, it won't delete the missing device.  How do I get
 past this?  I'm pretty sure the problem is in some files I want to
 delete anyhow.  Would deleting them solve the problem?

 On Sat, Aug 15, 2015 at 12:59 AM, Anand Jain anand.j...@oracle.com wrote:

 BTW, when this is all over with, how do I make sure there are really
 two copies of everything?  Will a scrub verify this?  Should I run a
 balance operation?

 pls use 'btrfs bal profile and convert' to migrate single chunk (if any
 created when there were lesser number of RW-able devices) back to your
 desired raid1. Do this when all the devices are back online. Kindly note
 there is a bug in the btrfs VM that you won't be able to bring a device
 online with out unmount - mount (I am working to fix). btrfs-progs will be
 wrong in this case don't depend too much on that.
 So to understand inside of btrfs kernel volume I generally use:
 https://patchwork.kernel.org/patch/5816011/

 In there if bdev is null it indicates device is scanned but not part of VM
 yet. Then unmount - mount will bring device back to be part of VM.

 After applying Anand's patch, I was able to mount my 4-drive RAID1
 and bring a new fourth drive online.

 However, something weird happened
 where the first delete missing only deleted one missing drive and
 only did a partial duplication.  I've posted a bug report here:

 that seems to be normal to me. unless I am missing something else / clarity.


 Thanks, Anand



 --
 Timothy Normand Miller, PhD
 Assistant 

Re: delete missing with two missing devices doesn't delete both missing, only does a partial reconstruction

2015-08-15 Thread Timothy Normand Miller
I didn't quite understand profile and convert, since I can't find a
profile option.  Is this something your patch adds?

Before I do that, however, I have to deal with this:

compute0 ~ # btrfs device delete missing /mnt/btrfs
ERROR: error removing the device 'missing' - Input/output error

[13058.298763] BTRFS warning (device sdc): csum failed ino 596 off
623218688 csum 2756583412 expected csum 4104700738
[13058.298775] BTRFS warning (device sdc): csum failed ino 596 off
623222784 csum 2568037276 expected csum 275151414
[13058.298782] BTRFS warning (device sdc): csum failed ino 596 off
623226880 csum 2227564114 expected csum 3824181799
[13058.298788] BTRFS warning (device sdc): csum failed ino 596 off
623230976 csum 3298529275 expected csum 1155389604
[13058.298794] BTRFS warning (device sdc): csum failed ino 596 off
623235072 csum 2603391790 expected csum 1861925401
[13058.298801] BTRFS warning (device sdc): csum failed ino 596 off
623239168 csum 2044148708 expected csum 3227559459
[13058.298807] BTRFS warning (device sdc): csum failed ino 596 off
623243264 csum 615351306 expected csum 2720021058
[13058.329747] BTRFS warning (device sdc): csum failed ino 596 off
623218688 csum 2756583412 expected csum 4104700738
[13058.329759] BTRFS warning (device sdc): csum failed ino 596 off
623222784 csum 2568037276 expected csum 275151414
[13058.329770] BTRFS warning (device sdc): csum failed ino 596 off
623226880 csum 2227564114 expected csum 3824181799

Because of this, it won't delete the missing device.  How do I get
past this?  I'm pretty sure the problem is in some files I want to
delete anyhow.  Would deleting them solve the problem?

On Sat, Aug 15, 2015 at 12:59 AM, Anand Jain anand.j...@oracle.com wrote:

 BTW, when this is all over with, how do I make sure there are really
 two copies of everything?  Will a scrub verify this?  Should I run a
 balance operation?

 pls use 'btrfs bal profile and convert' to migrate single chunk (if any
 created when there were lesser number of RW-able devices) back to your
 desired raid1. Do this when all the devices are back online. Kindly note
 there is a bug in the btrfs VM that you won't be able to bring a device
 online with out unmount - mount (I am working to fix). btrfs-progs will be
 wrong in this case don't depend too much on that.
 So to understand inside of btrfs kernel volume I generally use:
 https://patchwork.kernel.org/patch/5816011/

 In there if bdev is null it indicates device is scanned but not part of VM
 yet. Then unmount - mount will bring device back to be part of VM.

 After applying Anand's patch, I was able to mount my 4-drive RAID1
 and bring a new fourth drive online.

 However, something weird happened
 where the first delete missing only deleted one missing drive and
 only did a partial duplication.  I've posted a bug report here:

 that seems to be normal to me. unless I am missing something else / clarity.


 Thanks, Anand



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: delete missing with two missing devices doesn't delete both missing, only does a partial reconstruction

2015-08-15 Thread Timothy Normand Miller
Here's the associated bug report with the full dmesg:

https://bugzilla.kernel.org/show_bug.cgi?id=102941

On Sat, Aug 15, 2015 at 9:13 AM, Timothy Normand Miller
theo...@gmail.com wrote:
 So I tried deleting the files that I think are the problem, and the
 file system went suddenly read-only, and I got this in dmesg:

 A bunch of these first messages:
 [39710.420118]  item 45 key (1668296151040 168 524288) itemoff 1557 itemsize 
 53
 [39710.420118]  extent refs 1 gen 166914 flags 1
 [39710.420119]  extent data backref root 949 objectid 440675
 offset 2621440 count 1
 [39710.420120]  item 46 key (1668296675328 168 524288) itemoff 1504 itemsize 
 53
 [39710.420120]  extent refs 1 gen 166914 flags 1
 [39710.420121]  extent data backref root 949 objectid 440675
 offset 3145728 count 1
 [39710.420121]  item 47 key (1668297199616 168 524288) itemoff 1451 itemsize 
 53
 [39710.420122]  extent refs 1 gen 166914 flags 1
 [39710.420122]  extent data backref root 949 objectid 440675
 offset 3670016 count 1
 [39710.420123]  item 48 key (1668297723904 168 524288) itemoff 1398 itemsize 
 53
 [39710.420123]  extent refs 1 gen 166914 flags 1
 [39710.420124]  extent data backref root 949 objectid 440675
 offset 4194304 count 1
 [39710.420125]  item 49 key (1668298248192 168 524288) itemoff 1345 itemsize 
 53
 [39710.420125]  extent refs 1 gen 166914 flags 1
 [39710.420126]  extent data backref root 949 objectid 440675
 offset 4718592 count 1
 [39710.420126]  item 50 key (1668298772480 168 524288) itemoff 1292 itemsize 
 53
 [39710.420127]  extent refs 1 gen 166914 flags 1
 [39710.420127]  extent data backref root 949 objectid 440675
 offset 5242880 count 1
 [39710.420128] BTRFS error (device sdc): unable to find ref byte nr
 1668272218112 parent 0 root 949  owner 1032823 offset 655360
 [39710.420129] BTRFS: error (device sdc) in __btrfs_free_extent:6232:
 errno=-2 No such entry
 [39710.420131] BTRFS: error (device sdc) in
 btrfs_run_delayed_refs:2821: errno=-2 No such entry
 [39710.431108] pending csums is 5795840

 On Sat, Aug 15, 2015 at 8:51 AM, Timothy Normand Miller
 theo...@gmail.com wrote:
 I didn't quite understand profile and convert, since I can't find a
 profile option.  Is this something your patch adds?

 Before I do that, however, I have to deal with this:

 compute0 ~ # btrfs device delete missing /mnt/btrfs
 ERROR: error removing the device 'missing' - Input/output error

 [13058.298763] BTRFS warning (device sdc): csum failed ino 596 off
 623218688 csum 2756583412 expected csum 4104700738
 [13058.298775] BTRFS warning (device sdc): csum failed ino 596 off
 623222784 csum 2568037276 expected csum 275151414
 [13058.298782] BTRFS warning (device sdc): csum failed ino 596 off
 623226880 csum 2227564114 expected csum 3824181799
 [13058.298788] BTRFS warning (device sdc): csum failed ino 596 off
 623230976 csum 3298529275 expected csum 1155389604
 [13058.298794] BTRFS warning (device sdc): csum failed ino 596 off
 623235072 csum 2603391790 expected csum 1861925401
 [13058.298801] BTRFS warning (device sdc): csum failed ino 596 off
 623239168 csum 2044148708 expected csum 3227559459
 [13058.298807] BTRFS warning (device sdc): csum failed ino 596 off
 623243264 csum 615351306 expected csum 2720021058
 [13058.329747] BTRFS warning (device sdc): csum failed ino 596 off
 623218688 csum 2756583412 expected csum 4104700738
 [13058.329759] BTRFS warning (device sdc): csum failed ino 596 off
 623222784 csum 2568037276 expected csum 275151414
 [13058.329770] BTRFS warning (device sdc): csum failed ino 596 off
 623226880 csum 2227564114 expected csum 3824181799

 Because of this, it won't delete the missing device.  How do I get
 past this?  I'm pretty sure the problem is in some files I want to
 delete anyhow.  Would deleting them solve the problem?

 On Sat, Aug 15, 2015 at 12:59 AM, Anand Jain anand.j...@oracle.com wrote:

 BTW, when this is all over with, how do I make sure there are really
 two copies of everything?  Will a scrub verify this?  Should I run a
 balance operation?

 pls use 'btrfs bal profile and convert' to migrate single chunk (if any
 created when there were lesser number of RW-able devices) back to your
 desired raid1. Do this when all the devices are back online. Kindly note
 there is a bug in the btrfs VM that you won't be able to bring a device
 online with out unmount - mount (I am working to fix). btrfs-progs will be
 wrong in this case don't depend too much on that.
 So to understand inside of btrfs kernel volume I generally use:
 https://patchwork.kernel.org/patch/5816011/

 In there if bdev is null it indicates device is scanned but not part of VM
 yet. Then unmount - mount will bring device back to be part of VM.

 After applying Anand's patch, I was able to mount my 4-drive RAID1
 and bring a new fourth drive online.

 However, something weird happened
 where the first delete missing only 

Re: delete missing with two missing devices doesn't delete both missing, only does a partial reconstruction

2015-08-14 Thread Timothy Normand Miller
BTW, when this is all over with, how do I make sure there are really
two copies of everything?  Will a scrub verify this?  Should I run a
balance operation?

On Fri, Aug 14, 2015 at 11:29 PM, Timothy Normand Miller
theo...@gmail.com wrote:
 After applying Anand's patch, I was able to mount my 4-drive RAID1 and
 bring a new fourth drive online.  However, something weird happened
 where the first delete missing only deleted one missing drive and
 only did a partial duplication.  I've posted a bug report here:

 https://bugzilla.kernel.org/show_bug.cgi?id=102901

 --
 Timothy Normand Miller, PhD
 Assistant Professor of Computer Science, Binghamton University
 http://www.cs.binghamton.edu/~millerti/
 Open Graphics Project



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: delete missing with two missing devices doesn't delete both missing, only does a partial reconstruction

2015-08-14 Thread Anand Jain


 BTW, when this is all over with, how do I make sure there are really
 two copies of everything?  Will a scrub verify this?  Should I run a
 balance operation?

pls use 'btrfs bal profile and convert' to migrate single chunk (if any 
created when there were lesser number of RW-able devices) back to your 
desired raid1. Do this when all the devices are back online. Kindly note 
there is a bug in the btrfs VM that you won't be able to bring a device 
online with out unmount - mount (I am working to fix). btrfs-progs will 
be wrong in this case don't depend too much on that.

So to understand inside of btrfs kernel volume I generally use:
https://patchwork.kernel.org/patch/5816011/

In there if bdev is null it indicates device is scanned but not part of 
VM yet. Then unmount - mount will bring device back to be part of VM.


 After applying Anand's patch, I was able to mount my 4-drive RAID1
 and bring a new fourth drive online.

 However, something weird happened
 where the first delete missing only deleted one missing drive and
 only did a partial duplication.  I've posted a bug report here:

that seems to be normal to me. unless I am missing something else / clarity.


Thanks, Anand
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


delete missing with two missing devices doesn't delete both missing, only does a partial reconstruction

2015-08-14 Thread Timothy Normand Miller
After applying Anand's patch, I was able to mount my 4-drive RAID1 and
bring a new fourth drive online.  However, something weird happened
where the first delete missing only deleted one missing drive and
only did a partial duplication.  I've posted a bug report here:

https://bugzilla.kernel.org/show_bug.cgi?id=102901

-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html