1. It's md-raid, with an lvm on top, and this is running in a virtual machine with lvm also enabled. 2. Originally, I was working from the Arch LiveCD, but I later created another disk to install ArchBang to. 3. I'm waiting for the check to complete. 4. SMART comes up clean
smartctl -x /dev/sdg | grep SCT SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. GP/S Log at address 0xe0 has 1 sectors [SCT Command/Status] GP/S Log at address 0xe1 has 1 sectors [SCT Data Transfer] SCT Status Version: 3 SCT Version (vendor specific): 256 (0x0100) SCT Support Level: 1 SCT Temperature History Version: 2 SCT Error Recovery Control: 5. It returns a value of 30. I'm running chunk-recover, but I'm going to let it write anything. I figure it'll take a while for it to scan, given the large size of the drive. On 22.08.2013, at 18:58, Chris Murphy <li...@colorremedies.com> wrote: > Non-expert on btrfs errors, so hopefully someone else will still reply with > recovery advice. I have some foundational questions on the setup that may > relate, if you don't already know what precipitated this failure: > > > 1. > You said it's md raid5, but I see /dev/mapper/main--storage--vg-root and dm-1 > or dm-2, so I wonder if this is md raid with LVM on top; or if this is LVM > raid5 (which directly implements raid5 at LV level, without mdadm, but does > use md code underneath)? > > 2. > In one dmesg I see /dev/dm-2 referenced with errors, and in another > /dev/dm-1. Is it actually the same btrfs volume, and if so I wonder why it's > sometimes being mapped to a difference dm device? > > 3. > If it's an md device, when was the last time a scrub check was run? > echo check > /sys/block/mdX/md/sync_action > then after that completes: > cat /sys/block/mdX/mismatch_cnt > > Or if LVM raid5, I think this is only recently added: > http://www.redhat.com/archives/lvm-devel/2013-April/msg00042.html > > 4. > smartctl -x for each drive; are there any indications of reallocated sectors, > pending sectors, bad block, ECC error, CRC or UDMA error? Also included in > the above command should return the SCT Error Recovery Control value for each > drive, what's that value? > > 5. > What is returned for any one of the drives: > > cat /sys/block/sdX/device/timeout > > Thanks, > > Chris Murphy > > > On Aug 22, 2013, at 1:38 PM, Nicholas Lee <em...@nickle.es> wrote: > >> Full pastebin here: http://cwillu.com:8080/96.245.194.45#6 >> >> [ 9.213212] Btrfs loaded >> [ 9.245673] device fsid 2ffb2450-f74f-4cfb-a3be-bb5e3c6d32ec devid 1 >> transid 23568 /dev/dm-1 >> [ 102.886834] device fsid 2ffb2450-f74f-4cfb-a3be-bb5e3c6d32ec devid 1 >> transid 23568 /dev/mapper/main--storage--vg-root >> [ 102.888348] btrfs: enabling auto recovery >> [ 102.888354] btrfs: disabling disk space caching >> [ 102.888357] btrfs: disabling disk space caching >> [ 102.911068] BTRFS critical (device dm-1): unable to find logical >> 1781900460032 len 4096 >> [ 102.911103] BTRFS emergency (device dm-1): No mapping for >> 1781900460032-1781900464128 >> >> [ 102.911108] btrfs: failed to read tree root on dm-1 >> [ 102.911186] BTRFS critical (device dm-1): unable to find logical >> 1781900460032 len 4096 >> [ 102.911217] BTRFS emergency (device dm-1): No mapping for >> 1781900460032-1781900464128 >> >> [ 102.911222] btrfs: failed to read tree root on dm-1 >> [ 102.911235] BTRFS critical (device dm-1): unable to find logical >> 1198824710144 len 4096 >> [ 102.911240] BTRFS emergency (device dm-1): No mapping for >> 1198824710144-1198824714240 >> >> [ 102.911243] btrfs: failed to read tree root on dm-1 >> [ 102.911255] BTRFS critical (device dm-1): unable to find logical >> 1198518919168 len 4096 >> [ 102.911286] BTRFS emergency (device dm-1): No mapping for >> 1198518919168-1198518923264 >> >> [ 102.911290] btrfs: failed to read tree root on dm-1 >> [ 102.911302] BTRFS critical (device dm-1): unable to find logical >> 582755782656 len 4096 >> [ 102.911308] BTRFS emergency (device dm-1): No mapping for >> 582755782656-582755786752 >> >> [ 102.911311] btrfs: failed to read tree root on dm-1 >> [ 102.986797] btrfs: open_ctree failed >> >> >> On 22.08.2013, at 15:23, Nicholas Lee <em...@nickle.es> wrote: >> >>> After updating the kernel and using btrfs-progs-git from the AUR, I'm now >>> getting this output. Does this yield any new insight? >>> >>> [ 473.305408] btrfs: failed to read tree root on dm-2 >>> [ 473.305555] BTRFS critical (device dm-2): unable to find logical >>> 1781900460032 len 4096 >>> [ 473.305591] BTRFS emergency (device dm-2): No mapping for >>> 1781900460032-1781900464128 >>> >>> >>> On 22.08.2013, at 10:09, Mitch Harder <mitch.har...@sabayonlinux.org> wrote: >>> >>>> On Thu, Aug 22, 2013 at 1:47 AM, Nicholas Lee <em...@nickle.es> wrote: >>>> >>>>> [ 45.914275] ------------[ cut here ]------------ >>>>> [ 45.914406] kernel BUG at fs/btrfs/volumes.c:4417! >>>>> [ 45.914489] invalid opcode: 0000 [#1] PREEMPT SMP >>>> >>>> I can't say if this will fix your problem or not, but the 3.10.x >>>> kernel has a patch to pass this error back instead of halting with a >>>> BUG() at this point. >>> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > Chris Murphy > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html