Dear all, Recently, I'm stuck with some AFS issues.
AFS client hanged with the following log message. In this case, the AFS instance blocked and jobs failed to access any files located in AFS. I have to reboot the work node to recover service. Dec 6 15:03:18 bws0825 kernel: INFO: task afs_callback:19124 blocked for more than 120 seconds. Dec 6 15:03:18 bws0825 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 6 15:03:18 bws0825 kernel: afs_callback D ffff9860d826e180 0 19124 2 0x00000000 Dec 6 15:03:18 bws0825 kernel: Call Trace: Dec 6 15:03:18 bws0825 kernel: afs_callback D ffff9860d826e180 0 19124 2 0x00000000 Dec 6 15:03:18 bws0825 kernel: Call Trace: Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2169df9>] schedule_preempt_disabled+0x29/0x70 Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2167d77>] __mutex_lock_slowpath+0xc7/0x1d0 Dec 6 15:03:18 bws0825 kernel: [<ffffffffa216715f>] mutex_lock+0x1f/0x2f Dec 6 15:03:18 bws0825 kernel: [<ffffffffc084dff4>] SRXAFSCB_InitCallBackState+0x34/0x470 [openafs] Dec 6 15:03:18 bws0825 kernel: [<ffffffffc0898047>] ? afs_xdr_vector+0x57/0x90 [openafs] Dec 6 15:03:18 bws0825 kernel: [<ffffffffc084f19e>] SRXAFSCB_InitCallBackState3+0xe/0x10 [openafs] Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08b6f43>] RXAFSCB_ExecuteRequest+0x6f3/0x8a0 [openafs] Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1b028ae>] ? getnstimeofday64+0xe/0x30 Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08ae589>] ? afs_mutex_exit+0x29/0x40 [openafs] Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08a6a5d>] rxi_ServerProc+0xcd/0x1e0 [openafs] Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? afs_shutdown_pagecopy+0x20/0x20 [openafs] Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08af017>] rx_ServerProc+0x87/0xe0 [openafs] Dec 6 15:03:18 bws0825 kernel: [<ffffffffc084eedd>] afs_RXCallBackServer+0x3d/0x50 [openafs] Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c76a5>] afsd_thread+0x1e5/0x730 [openafs] Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? afs_shutdown_pagecopy+0x20/0x20 [openafs] Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1da1>] kthread+0xd1/0xe0 Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? insert_kthread_work+0x40/0x40 Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2175c1d>] ret_from_fork_nospec_begin+0x7/0x21 Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? insert_kthread_work+0x40/0x40 Dec 6 15:03:18 bws0825 kernel: INFO: task afs_rxevent:19127 blocked for more than 120 seconds. Dec 6 15:03:18 bws0825 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 6 15:03:18 bws0825 kernel: afs_rxevent D ffff9860cbbf6180 0 19127 2 0x00000000 Dec 6 15:03:18 bws0825 kernel: Call Trace: Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1aaa2d2>] ? del_timer_sync+0x52/0x60 Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2169df9>] schedule_preempt_disabled+0x29/0x70 Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2167d77>] __mutex_lock_slowpath+0xc7/0x1d0 Dec 6 15:03:18 bws0825 kernel: [<ffffffffa216715f>] mutex_lock+0x1f/0x2f Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08bdb58>] afs_osi_TimedSleep+0x118/0x210 [openafs] Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ad6b60>] ? wake_up_state+0x20/0x20 Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08bdce8>] afs_osi_Wait+0x98/0xd0 [openafs] Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? afs_shutdown_pagecopy+0x20/0x20 [openafs] Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08af575>] afs_rxevent_daemon+0x95/0x140 [openafs] Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c7af6>] afsd_thread+0x636/0x730 [openafs] Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? afs_shutdown_pagecopy+0x20/0x20 [openafs] Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1da1>] kthread+0xd1/0xe0 Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? insert_kthread_work+0x40/0x40 Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2175c1d>] ret_from_fork_nospec_begin+0x7/0x21 Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? insert_kthread_work+0x40/0x40 Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2175c1d>] ret_from_fork_nospec_begin+0x7/0x21 Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? insert_kthread_work+0x40/0x40 Dec 6 15:03:18 bws0825 kernel: INFO: task afs_checkserver:19870 blocked for more than 120 seconds. Dec 6 15:03:18 bws0825 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 6 15:03:18 bws0825 kernel: afs_checkserver D ffff9860c7811040 0 19870 2 0x00000000 Dec 6 15:03:18 bws0825 kernel: Call Trace: Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1aaa2d2>] ? del_timer_sync+0x52/0x60 Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2169df9>] schedule_preempt_disabled+0x29/0x70 Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2167d77>] __mutex_lock_slowpath+0xc7/0x1d0 Dec 6 15:03:18 bws0825 kernel: [<ffffffffa216715f>] mutex_lock+0x1f/0x2f Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08bdb58>] afs_osi_TimedSleep+0x118/0x210 [openafs] Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ad6b60>] ? wake_up_state+0x20/0x20 Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08bdce8>] afs_osi_Wait+0x98/0xd0 [openafs] Dec 6 15:03:18 bws0825 kernel: [<ffffffffc0853b08>] afs_CheckServerDaemon+0x118/0x1a0 [openafs] Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? afs_shutdown_pagecopy+0x20/0x20 [openafs] Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c7930>] afsd_thread+0x470/0x730 [openafs] Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? afs_shutdown_pagecopy+0x20/0x20 [openafs] Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1da1>] kthread+0xd1/0xe0 Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? insert_kthread_work+0x40/0x40 Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2175c1d>] ret_from_fork_nospec_begin+0x7/0x21 The following message is the specification of AFS cluster at IHEP. AFS clinet--sl7 [root@bws0825 ~]# rpm -qa|grep openafs openafs-1.6-sl-client-1.6.23-289.sl7.x86_64 openafs-1.6-sl-authlibs-1.6.23-289.sl7.x86_64 openafs-1.6-sl-devel-1.6.23-289.sl7.x86_64 openafs-1.6-sl-module-tools-1.6.23-289.sl7.x86_64 openafs-1.6-sl-krb5-1.6.23-289.sl7.x86_64 openafs-1.6-sl-1.6.23-289.sl7.x86_64 openafs-1.6-sl-authlibs-devel-1.6.23-289.sl7.x86_64 kmod-openafs-1.6-sl-1.6.23-1.SL76.el7.noarch kmod-openafs-1.6-sl-957-1.6.23-289.sl7.957.x86_64 AFS client-SL6: openafs-krb5-1.6.23-289.sl6.x86_64 openafs-client-1.6.23-289.sl6.x86_64 openafs-1.6.23-289.sl6.x86_64 openafs-kpasswd-1.6.23-289.sl6.x86_64 openafs-module-tools-1.6.23-289.sl6.x86_64 openafs-kernel-source-1.6.23-289.sl6.x86_64 openafs-firstboot-1.6-1.sl6.noarch openafs-authlibs-1.6.23-289.sl6.x86_64 kmod-openafs-696-1.6.20-257.sl6.696.x86_64 kmod-openafs-1.6.22.3-1.SL610.el6.noarch openafs-compat-1.6.23-289.sl6.x86_64 AFS server side: openafs-authlibs-1.6.20-256.sl6.x86_64 openafs-server-1.6.11-1.el6.x86_64 openafs-kernel-source-1.6.11-1.el6.x86_64 openafs-docs-1.6.11-1.el6.x86_64 openafs-authlibs-devel-1.6.20-256.sl6.x86_64 openafs-1.6.11-1.el6.x86_64 openafs-compat-1.6.11-1.el6.x86_64 openafs-devel-1.6.20-256.sl6.x86_64 openafs-client-1.6.11-1.el6.x86_64 kmod-openafs-1.6.11-1.2.6.32_504.3.3.el6.x86_64 openafs-krb5-1.6.11-1.el6.x86_64 Does the 1.6.23 is not compatible with the linux kernel or AFS server version? Any information you provided would be appreciated. Thanks. Regards, Qiulan huangql ==================================================================== Computing center,the Institute of High Energy Physics, CAS, China Qiulan Huang Tel: (+86) 10 8823 6087 P.O. Box 918-7 Fax: (+86) 10 8823 6839 Beijing 100049 P.R. China Email: huan...@ihep.ac.cn ===================================================================