Kernel crash on mount after SMR disk trouble

Jukka Larja Sat, 14 May 2016 01:30:07 -0700

In short:

I added two 8TB Seagate Archive SMR disk to btrfs pool and tried to deleteone of the old disks. After some errors I ended up with file system that canbe mounted read-only, but crashes the kernel if mounted normally. Triedbtrfs check --repair (which noted that space cache needs to be zeroed) andzeroing space cache (via mount parameter), but that didn't change anything.


Longer version:

I was originally running Debian Jessie with some pretty recent kernel (maybe4.4), but somewhat older btrfs tools. After the trouble started, I triedupdating (now running Kernel 4.5.1 and tools 4.4.1). I checked the new diskswith badblocks (no problems found), but based on some googling, Seagate'sSMR disks seem to have various problems, so the root cause is probably onetype or another of disk errors.


Here's the output of btrfs fi show:

Label: none  uuid: 8b65962d-0982-449b-ac6f-1acc8397ceb9
        Total devices 12 FS bytes used 13.15TiB
        devid    1 size 3.64TiB used 3.36TiB path /dev/sde1
        devid    2 size 3.64TiB used 3.36TiB path /dev/sdg1
        devid    3 size 3.64TiB used 3.36TiB path /dev/sdh1
        devid    4 size 3.64TiB used 3.34TiB path /dev/sdf1
        devid    5 size 1.82TiB used 1.44TiB path /dev/sdi1
        devid    6 size 1.82TiB used 1.54TiB path /dev/sdl1
        devid    7 size 1.82TiB used 1.51TiB path /dev/sdk1
        devid    8 size 1.82TiB used 1.54TiB path /dev/sdj1
        devid    9 size 3.64TiB used 3.31TiB path /dev/sdb1
        devid   10 size 3.64TiB used 3.36TiB path /dev/sda1
        devid   11 size 7.28TiB used 168.00GiB path /dev/sdc1
        devid   12 size 7.28TiB used 168.00GiB path /dev/sdd1

Last two devices (11 and 12) are the new disks. After adding them, I firstcopied some new data in (about 130 GBs), which seemed to go fine. Then Itried to remove disk 5. After some time (about 30 GiBs written to 11 and12), there were some errors and disk 11 or 12 dropped out and fs wentread-only. After some trouble-shooting (googling), I decided the new diskswere too iffy to trust and tried to remove them.

I don't remember exactly what errors I got, but device delete operation wasinterrupted due to errors at least once or twice, before more serioustrouble began. In between the attempts I updated the HBA's (an LSI 9300)firmware. After final device delete attempt the end result was thatattempting to mount causes kernel to crash. I then tried updating kernel andrunning check --repair, but that hasn't helped. Mounting read-only seems towork perfectly, but I haven't tried copying everything to /dev/null oranything like that (just few files).

The log of the crash (it is very repeatable) can be seen here:http://jane.aarghimedes.fi/~jlarja/tempe/btrfs-trouble/btrfs_crash_log.txt


Snipped from start of that:

touko 12 06:41:22 jane kernel: BTRFS info (device sda1): disk space cachingis enabledtouko 12 06:41:24 jane kernel: BTRFS info (device sda1): bdev /dev/sdd1errs: wr 0, rd 0, flush 1, corrupt 0, gen 0touko 12 06:41:39 jane kernel: BUG: unable to handle kernel NULL pointerdereference at 00000000000001f0touko 12 06:41:39 jane kernel: IP: [<ffffffffc030e0ee>]can_overcommit+0x1e/0xf0 [btrfs]

touko 12 06:41:39 jane kernel: PGD 0
touko 12 06:41:39 jane kernel: Oops: 0000 [#1] SMP

My dmesg log is here:http://jane.aarghimedes.fi/~jlarja/tempe/btrfs-trouble/dmesg.log


Other information:
Linux jane 4.5.0-1-amd64 #1 SMP Debian 4.5.1-1 (2016-04-14) x86_64 GNU/Linux
btrfs-progs v4.4.1

btrfs fi df /mnt/Allosaurus/
    Data, RAID1: total=13.13TiB, used=13.07TiB
    Data, single: total=8.00MiB, used=0.00B
    System, RAID1: total=8.00MiB, used=1.94MiB
    System, single: total=4.00MiB, used=0.00B
    Metadata, RAID1: total=87.00GiB, used=85.24GiB
    Metadata, single: total=8.00MiB, used=0.00B
    GlobalReserve, single: total=512.00MiB, used=0.00B

The data is either backups or media data dublicated elsewhere, so I'm in nogreat hurry and could just fix everything just with enough new disks and cp-R. However, it would save me a lot of trouble (and some money) if I couldget this fixed otherwise. Of course, would be nice in general for the futurekernel not to crash when mounting corrupted file system :) .


--
     ...Elämälle vierasta toimintaa...
     Jukka Larja, jla...@iki.fi, 0407679919

"Our own Charlie D reckons that 18.2 per cent of Internet traffic is nowpr0n, and if Intel's Netbust can make the Internet faster, can the sempr0nmake pr0n faster?"

- The Inquirer, http://www.theinquirer.net/?article=16447 -

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Kernel crash on mount after SMR disk trouble

Reply via email to