[lustre-discuss] ZFS PANIC

Bob Ball Fri, 10 Feb 2017 06:40:01 -0800

Hi,

I am getting this message


PANIC: zfs: accessing past end of object 29/7 (size=33792 access=33792+128)

The affected OST seems to reject new mounts from clients now, and thelctl dl count of connections to the obdfilter process increases, butdoes not seem to decrease?


This is Lustre 2.7.58 with zfs 0.6.4.2

Can anyone help me diagnose and fix whatever is going wrong here? I'veincluded the stack dump below.


Thanks,
bob

2017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.781874]Showing stack for process 244492017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.781876]Pid: 24449, comm: ll_ost00_078 Tainted: P ---------------2.6.32.504.16.2.el6_lustre #72017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.781878]Call Trace:2017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.781902][<ffffffffa0406f8d>] ? spl_dumpstack+0x3d/0x40 [spl]2017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.781908][<ffffffffa040701d>] ? vcmn_err+0x8d/0xf0 [spl]2017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.781950][<ffffffffa0465a46>] ? RW_WRITE_HELD+0x66/0xb0 [zfs]2017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.781970][<ffffffffa0466eb8>] ? dbuf_rele_and_unlock+0x268/0x3f0 [zfs]2017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.781991][<ffffffffa04687ba>] ? dbuf_read+0x5ca/0x8a0 [zfs]2017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782024][<ffffffffa04bb032>] ? zfs_panic_recover+0x52/0x60 [zfs]2017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782045][<ffffffffa0471e3b>] ? dmu_buf_hold_array_by_dnode+0x41b/0x560 [zfs]2017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782068][<ffffffffa0472205>] ? dmu_buf_hold_array+0x65/0x90 [zfs]2017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782090][<ffffffffa0472668>] ? dmu_write+0x68/0x1a0 [zfs]2017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782147][<ffffffffa08fa0ae>] ? lprocfs_oh_tally+0x2e/0x50 [obdclass]2017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782173][<ffffffffa103f311>] ? osd_write+0x1d1/0x390 [osd_zfs]2017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782206][<ffffffffa0926aad>] ? dt_record_write+0x3d/0x130 [obdclass]2017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782305][<ffffffffa0ba7575>] ? tgt_client_data_write+0x165/0x1b0 [ptlrpc]2017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782347][<ffffffffa0bab575>] ? tgt_client_data_update+0x335/0x680 [ptlrpc]2017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782388][<ffffffffa0bac298>] ? tgt_client_new+0x3d8/0x6a0 [ptlrpc]2017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782407][<ffffffffa117fad3>] ? ofd_obd_connect+0x363/0x400 [ofd]2017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782443][<ffffffffa0b12158>] ? target_handle_connect+0xe58/0x2d30 [ptlrpc]2017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782450][<ffffffff8106d1a5>] ? enqueue_entity+0x125/0x4502017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782457][<ffffffff8105870c>] ? check_preempt_curr+0x7c/0x902017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782462][<ffffffff81064a2e>] ? try_to_wake_up+0x24e/0x3e02017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782481][<ffffffffa07da6ca>] ? lc_watchdog_touch+0x7a/0x190 [libcfs]2017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782524][<ffffffffa0bb6f52>] ? tgt_request_handle+0x5b2/0x1230 [ptlrpc]2017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782564][<ffffffffa0b5f5d1>] ? ptlrpc_main+0xe41/0x1920 [ptlrpc]2017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782570][<ffffffff81014959>] ? sched_clock+0x9/0x102017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782576][<ffffffff81529e1e>] ? thread_return+0x4e/0x7d02017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782615][<ffffffffa0b5e790>] ? ptlrpc_main+0x0/0x1920 [ptlrpc]2017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782622][<ffffffff8109e71e>] ? kthread+0x9e/0xc02017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782626][<ffffffff8109e680>] ? kthread+0x0/0xc02017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782632][<ffffffff8100c20a>] ? child_rip+0xa/0x202017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782636][<ffffffff8109e680>] ? kthread+0x0/0xc02017-02-08T23:02:23-05:00 umdist01.aglt2.org kernel: [11630254.782641][<ffffffff8100c200>] ? child_rip+0x0/0x20



Later, that same process showed:

2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.773156]LNet: Service thread pid 24449 was inactive for 200.00s. The threadmight be hung, or it might only be slow and will resume later. Dumpingthe stack trace for debugging purposes:2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.773163]Pid: 24449, comm: ll_ost00_078

2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.773164]

2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.773165]Call Trace:2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.773181][<ffffffff81010f85>] ? show_trace_log_lvl+0x55/0x702017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.773194][<ffffffff8152966e>] ? dump_stack+0x6f/0x762017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.773249][<ffffffffa0407035>] vcmn_err+0xa5/0xf0 [spl]2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.773373][<ffffffffa0465a46>] ? RW_WRITE_HELD+0x66/0xb0 [zfs]2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.773393][<ffffffffa0466eb8>] ? dbuf_rele_and_unlock+0x268/0x3f0 [zfs]2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.773412][<ffffffffa04687ba>] ? dbuf_read+0x5ca/0x8a0 [zfs]2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.773444][<ffffffffa04bb032>] zfs_panic_recover+0x52/0x60 [zfs]2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.773463][<ffffffffa0471e3b>] dmu_buf_hold_array_by_dnode+0x41b/0x560 [zfs]2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.773483][<ffffffffa0472205>] dmu_buf_hold_array+0x65/0x90 [zfs]2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.773502][<ffffffffa0472668>] dmu_write+0x68/0x1a0 [zfs]2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.773579][<ffffffffa08fa0ae>] ? lprocfs_oh_tally+0x2e/0x50 [obdclass]2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.773617][<ffffffffa103f311>] osd_write+0x1d1/0x390 [osd_zfs]2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.773659][<ffffffffa0926aad>] dt_record_write+0x3d/0x130 [obdclass]2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.773860][<ffffffffa0ba7575>] tgt_client_data_write+0x165/0x1b0 [ptlrpc]2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.773899][<ffffffffa0bab575>] tgt_client_data_update+0x335/0x680 [ptlrpc]2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.773938][<ffffffffa0bac298>] tgt_client_new+0x3d8/0x6a0 [ptlrpc]2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.773961][<ffffffffa117fad3>] ofd_obd_connect+0x363/0x400 [ofd]2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.773997][<ffffffffa0b12158>] target_handle_connect+0xe58/0x2d30 [ptlrpc]2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.774002][<ffffffff8106d1a5>] ? enqueue_entity+0x125/0x4502017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.774006][<ffffffff8105870c>] ? check_preempt_curr+0x7c/0x902017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.774010][<ffffffff81064a2e>] ? try_to_wake_up+0x24e/0x3e02017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.774055][<ffffffffa07da6ca>] ? lc_watchdog_touch+0x7a/0x190 [libcfs]2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.774111][<ffffffffa0bb6f52>] tgt_request_handle+0x5b2/0x1230 [ptlrpc]2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.774163][<ffffffffa0b5f5d1>] ptlrpc_main+0xe41/0x1920 [ptlrpc]2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.774167][<ffffffff81014959>] ? sched_clock+0x9/0x102017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.774170][<ffffffff81529e1e>] ? thread_return+0x4e/0x7d02017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.774225][<ffffffffa0b5e790>] ? ptlrpc_main+0x0/0x1920 [ptlrpc]2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.774230][<ffffffff8109e71e>] kthread+0x9e/0xc02017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.774232][<ffffffff8109e680>] ? kthread+0x0/0xc02017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.774235][<ffffffff8100c20a>] child_rip+0xa/0x202017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.774237][<ffffffff8109e680>] ? kthread+0x0/0xc02017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.774239][<ffffffff8100c200>] ? child_rip+0x0/0x20

2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.774240]

2017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630454.774243]LustreError: dumping log to /tmp/lustre-log.1486613143.244492017-02-08T23:05:43-05:00 umdist01.aglt2.org kernel: [11630455.164028]Pid: 23795, comm: ll_ost01_026

There were at least 4 different PIDs that showed this situation. Theyseem to be named like ll_ost01_063



_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] ZFS PANIC

Reply via email to