extent_io.c:2062 (v4.2.0-rc8)

Dāvis Mosāns Mon, 31 Aug 2015 17:09:06 -0700

2015-08-31 18:14 GMT+00:00 Dāvis Mosāns <davis...@gmail.com>:
> I'm getting kernel crash and complete system lockup when trying to access
> journal on two disk btrfs filesystem with data/metadata as RAID1.
>
> I can't get proper log because whole system hangs and even kdump fails,
> seems it doesn't start or I'm doing something wrong.
>
> Also because there are several call traces and they all get printed on
> screen within few seconds I can get photos only on few last ones.
> But I managed to get some low-quality blurry photos with 80 FPS
> recording.
>
> So from them I saw
>
> kernel BUG at fs/btrfs/extent_io.c:2062
> extent_i...@2062.png => http://i.imgur.com/uuxOGIR.png
>
> kernel BUG at fs/btrfs/extent_io.c:2140
> extent_i...@2140.png => http://i.imgur.com/j5xrt7w.png
>
> kernel BUG at fs/btrfs/extent_io.c:2338
> extent_io.c@2338_0.png => http://i.imgur.com/EosplAu.png
> extent_io.c@2338_1.png => http://i.imgur.com/rsE9qNT.png
>
> kernel BUG at fs/btrfs/volumes.c:5399
> volumes.c@5399_0.png => http://i.imgur.com/iV9zqAv.png
> volumes.c@5399_1.png => http://i.imgur.com/VCyr07R.png
>
>
> And better photos
>
> BUG: scheduling while atomic: kworker/u16
> scheduling_while_atomic_0.jpg => http://i.imgur.com/asHjcM9.jpg
> scheduling_while_atomic_1.jpg => http://i.imgur.com/OJSFDUx.jpg
> scheduling_while_atomic_2.jpg => http://i.imgur.com/0nHQin8.jpg
> scheduling_while_atomic_3.jpg => http://i.imgur.com/ZmzOh7f.jpg
>
> Watchdog detected hard LOCKUP on cpu
> watchdog_detected_hard_LOCKUP_0.jpg => http://i.imgur.com/6W4FlfI.jpg
> watchdog_detected_hard_LOCKUP_1.jpg => http://i.imgur.com/WxxGozJ.jpg
> watchdog_detected_hard_LOCKUP_2.jpg => http://i.imgur.com/0Mmifwf.jpg
>
> BUG: unable to handle kernel paging request
> unable_to_handle_kernel_paging_request.jpg => http://i.imgur.com/4Sz4v96.jpg
>
> BUG: unable to handle kernel
> unable_to_handle_kernel.jpg => http://i.imgur.com/T0x7K4a.jpg
>
>
> Weird is that it crashes only sometimes and when reading all files then
> it doesn't crash, but only when try to open journal with journalctl.
> Also btrfs scrub and balance finishes without any errors.
> Even btrfs check and check --repair completed successfully without
> finding anything to repair. Also this crash happened on v4.1.6 too and
> now I'll recompile v4.2 as it got released.
>
>
> I'm getting this crash since I decided to test how well Linux handles
> one disk loss on btrfs RAID1 (I just pulled one disk out), it kept
> working but there were some call traces and when I plugged it back
> in then btrfs failed to write to it and after few mins system froze but
> before that SMART test passed on that disk.
> Then I rebooted and ran scrub which fixed errors on that disk.
> Next I was trying to test other disk and for it executed
> echo 1 > /sys/block/sdf/device/delete
> which caused immediate system hang.
> And now this filesystem crashes kernel when I try to view journal.
> I think RAID1 should handle well such cases when one disk
> disappears or is corrupted but currently it doesn't work and
> crashes whole system.
>



I found that file which is causing kernel crash and most of time it
gives I/O error
/var/log/journal/873a5f55f2aa4b33b2568baca40e6a91/system@00051e80d8810e86-e5a1ec29d9167e9f.journal~:
Input/output error

but sometimes it causes instant system freeze
cat system@00051e80d8810e86-e5a1ec29d9167e9f.journal~ > /dev/null
<system freeze>

There's nothing in kernel logs when freeze happens.
Also any user who can read that file can cause kernel crash, nice DoS

Here's a btrfs-image from that filesystems /dev/sdb
https://drive.google.com/file/d/0B82_Tz1_6URAQmV5LTZHUmR4YXM/view?usp=sharing
sha256sum
88fb561b4a581319ae18c1f27b6ac108e9c08ff80954e192cb3201cc5d4c19ff raid1_sdb.img
size 142M

only difference for btrfs-image between disks
image from /dev/sdb  => image from /dev/sdf
0x00000400 2fc3d988 => 8c421133
0x000004c9 02 => 01
0x0000050b 7ed7472cd5d44f5e842ede789208dfd9 => 3ceab04840a3412da65cab36dba5c17e

mount options rw,noatime,compress=lzo,space_cache,autodefrag
and features
* big_metadata
* compress_lzo
* default_subvol
* extended_iref
* mixed_backref
* no_holes
* skinny_metadata

$ btrfs filesystem show
Label: 'RAID'  uuid: 247e6249-6de1-45cb-9dd0-fa8a654234bf
        Total devices 2 FS bytes used 16.38GiB
        devid    1 size 2.73TiB used 18.03GiB path /dev/sdb
        devid    2 size 2.73TiB used 18.03GiB path /dev/sdf

$ btrfs filesystem usage
Overall:
    Device size:                   5.46TiB
    Device allocated:             36.06GiB
    Device unallocated:            5.42TiB
    Device missing:                  0.00B
    Used:                         32.75GiB
    Free (estimated):              2.71TiB      (min: 2.71TiB)
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:               48.00MiB      (used: 0.00B)

Data,RAID1: Size:17.00GiB, Used:16.24GiB
   /dev/sdb       17.00GiB
   /dev/sdf       17.00GiB

Metadata,RAID1: Size:1.00GiB, Used:136.64MiB
   /dev/sdb        1.00GiB
   /dev/sdf        1.00GiB

System,RAID1: Size:32.00MiB, Used:16.00KiB
   /dev/sdb       32.00MiB
   /dev/sdf       32.00MiB

Unallocated:
   /dev/sdb        2.71TiB
   /dev/sdf        2.71TiB


$ btrfs scrub start -B -d -R /dev/sdb
scrub device /dev/sdb (id 1) done
        scrub started at Mon Aug 31 20:58:45 2015 and finished after 00:01:29
        data_extents_scrubbed: 359177
        tree_extents_scrubbed: 8746
        data_bytes_scrubbed: 17442004992
        tree_bytes_scrubbed: 143294464
        read_errors: 0
        csum_errors: 0
        verify_errors: 0
        no_csum: 42403
        csum_discards: 100132
        super_errors: 0
        malloc_errors: 0
        uncorrectable_errors: 0
        unverified_errors: 0
        corrected_errors: 0
        last_physical: 21504196608

$ btrfs scrub start -B -d -R /dev/sdf
scrub device /dev/sdf (id 2) done
        scrub started at Mon Aug 31 21:18:33 2015 and finished after 00:01:31
        data_extents_scrubbed: 359177
        tree_extents_scrubbed: 8746
        data_bytes_scrubbed: 17442004992
        tree_bytes_scrubbed: 143294464
        read_errors: 0
        csum_errors: 0
        verify_errors: 0
        no_csum: 42403
        csum_discards: 100132
        super_errors: 0
        malloc_errors: 0
        uncorrectable_errors: 0
        unverified_errors: 0
        corrected_errors: 0
        last_physical: 21484273664

$ btrfs balance start -v
Dumping filters: flags 0x7, state 0x0, force is off
  DATA (flags 0x0): balancing
  METADATA (flags 0x0): balancing
  SYSTEM (flags 0x0): balancing
Done, had to relocate 19 out of 19 chunks

$ btrfs check --repair --check-data-csum /dev/sdb
enabling repair mode
Checking filesystem on /dev/sdb
UUID: 247e6249-6de1-45cb-9dd0-fa8a654234bf
checking extents
Fixed 0 roots.
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
checking csums
checking root refs
found 17581105170 bytes used err is 0
total csum bytes: 16863596
total tree bytes: 143294464
total fs tree bytes: 111984640
total extent tree bytes: 12009472
btree space waste bytes: 25424343
file data blocks allocated: 17710305280
 referenced 20970795008
btrfs-progs v4.1.2


$ btrfs check --repair --check-data-csum /dev/sdf
enabling repair mode
Checking filesystem on /dev/sdf
UUID: 247e6249-6de1-45cb-9dd0-fa8a654234bf
checking extents
Fixed 0 roots.
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
checking csums
checking root refs
found 17581105170 bytes used err is 0
total csum bytes: 16863596
total tree bytes: 143294464
total fs tree bytes: 111984640
total extent tree bytes: 12009472
btree space waste bytes: 25424343
file data blocks allocated: 17710305280
 referenced 20970795008
btrfs-progs v4.1.2


Seems btrfs-progs think everything is fine with filesystem even if
some files give I/O error or crash kernel on RAID1
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] kernel BUG at fs/btrfs/extent_io.c:2062 (v4.2.0-rc8)

Reply via email to