A CentOS 7 box here recently started being very slow to give the Password 
prompt when using sudo.  (25 seconds!)  I eventually tracked this down to the 
following complaint in the kernel message log:

> Feb 26 12:55:05 badboy kernel: INFO: task fprintd:1932 blocked for more than 
> 120 seconds.
> Feb 26 12:55:05 badboy kernel: "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Feb 26 12:55:05 badboy kernel: fprintd         D ffff88021f7300f8     0  1932 
>      1 0x00000080
> Feb 26 12:55:05 badboy kernel: ffff88022354bdc0 0000000000000086 
> ffff8800c54e8000 ffff88022354bfd8
> Feb 26 12:55:05 badboy kernel: ffff88022354bfd8 ffff88022354bfd8 
> ffff8800c54e8000 ffff88021f7300f0
> Feb 26 12:55:05 badboy kernel: ffff88021f7300f4 ffff8800c54e8000 
> 00000000ffffffff ffff88021f7300f8
> Feb 26 12:55:05 badboy kernel: Call Trace:
> Feb 26 12:55:05 badboy kernel: [<ffffffff8163b9e9>] 
> schedule_preempt_disabled+0x29/0x70
> Feb 26 12:55:05 badboy kernel: [<ffffffff816396e5>] 
> __mutex_lock_slowpath+0xc5/0x1c0
> Feb 26 12:55:05 badboy kernel: [<ffffffff81638b4f>] mutex_lock+0x1f/0x2f
> Feb 26 12:55:05 badboy kernel: [<ffffffff81449729>] 
> read_descriptors+0x39/0x110
> Feb 26 12:55:05 badboy kernel: [<ffffffff8125c28b>] ? read+0x6b/0x1f0
> Feb 26 12:55:05 badboy kernel: [<ffffffff8125c2fb>] read+0xdb/0x1f0
> Feb 26 12:55:05 badboy kernel: [<ffffffff811de43c>] vfs_read+0x9c/0x170
> Feb 26 12:55:05 badboy kernel: [<ffffffff811def8f>] SyS_read+0x7f/0xe0
> Feb 26 12:55:05 badboy kernel: [<ffffffff81645909>] 
> system_call_fastpath+0x16/0x1b


Since fprintd is the fingerprint reader daemon and this server doesn’t have 
such a peripheral, I removed the service.  Now sudo is fast again. [*]

Although that solves my immediate problem, while debugging all of this I 
noticed substantially the same error message also appearing for libvirtd:

> Feb 26 12:57:05 badboy kernel: INFO: task libvirtd:1514 blocked for more than 
> 120 seconds.
> Feb 26 12:57:05 badboy kernel: "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Feb 26 12:57:05 badboy kernel: libvirtd        D ffff88021f7300f8     0  1514 
>      1 0x00000080
> Feb 26 12:57:05 badboy kernel: ffff8800c08ffde0 0000000000000086 
> ffff8800c550e780 ffff8800c08fffd8
> Feb 26 12:57:05 badboy kernel: ffff8800c08fffd8 ffff8800c08fffd8 
> ffff8800c550e780 ffff88021f7300f0
> Feb 26 12:57:05 badboy kernel: ffff88021f7300f4 ffff8800c550e780 
> 00000000ffffffff ffff88021f7300f8
> Feb 26 12:57:05 badboy kernel: Call Trace:
> Feb 26 12:57:05 badboy kernel: [<ffffffff8163b9e9>] 
> schedule_preempt_disabled+0x29/0x70
> Feb 26 12:57:05 badboy kernel: [<ffffffff816396e5>] 
> __mutex_lock_slowpath+0xc5/0x1c0
> Feb 26 12:57:05 badboy kernel: [<ffffffff81638b4f>] mutex_lock+0x1f/0x2f
> Feb 26 12:57:05 badboy kernel: [<ffffffff81449100>] 
> manufacturer_show+0x20/0x50
> Feb 26 12:57:05 badboy kernel: [<ffffffff813f1c20>] dev_attr_show+0x20/0x60
> Feb 26 12:57:05 badboy kernel: [<ffffffff8125928a>] sysfs_read_file+0x9a/0x1a0
> Feb 26 12:57:05 badboy kernel: [<ffffffff811de43c>] vfs_read+0x9c/0x170
> Feb 26 12:57:05 badboy kernel: [<ffffffff811def8f>] SyS_read+0x7f/0xe0
> Feb 26 12:57:05 badboy kernel: [<ffffffff81645909>] 
> system_call_fastpath+0x16/0x1b


I don’t want to employ the same solution in this case — i.e. disable libvirtd — 
because while I don’t yet use KVM on that server, I actually had plans to do so 
soon.  I could host those VMs elsewhere, but that feels like sweeping the 
problem under the rug.

I want to fix this, but I’m now stuck.

I’ve ruled out several possible causes already:

1. It isn’t DNS.  DNS responds quickly.

2. It isn’t a flaky OS drive.  Swapping the spinning rust drive out for an SSD 
was already on the wish list for this system, so I took the opportunity to 
reinstall the OS fresh on a brand new SSD.  The same problem occurred shortly 
after booting CentOS 7.2 for the first time, and it continues to happen 
periodically.

The only thing I’ve copied over from the old spinning drive so far is /home and 
/usr/local, and I can’t see how that could affect the kernel.  Besides, the 
problem still occurs while booted into single-user mode.

3. It isn’t the RAID card.  A RAID verify pass completed successfully, as did 
smartctl -t long tests on all of the individual drives.  I even disabled the 
RAID card’s OPROM and blacklisted its driver at one point, just to see if the 
driver or RAID BIOS were the cause of the problem.  I haven’t actually pulled 
the card yet, but that seems unlikely to solve anything, given that the array 
seems to be storing data reliably.

4. It isn’t the RAM.  It just passed a memtest86+ run.  That probably 
exonerates the CPU and north bridge, too.  (I used v4.20 from the CentOS 7.2 
install ISO.)

5. It isn’t stale packages.  The OS reinstall proves that.  It’s currently 
running the tip of CentOS 7.2 and still showing the same symptoms.

6. It isn’t high-uptime kernel space cruft, since it’s been rebooted a bunch of 
times during all of this, including several hard reboots.

Rebooting is especially painful at the moment because the system hangs on 
shutdown waiting for libvirtd to respond.  While I had fprintd installed, that 
was causing 3-minute hangs on boot, too.

Very frustrating!



[*] In case it isn’t clear how fprintd could affect sudo, it’s because fprintd 
installs a PAM module, and sudo uses PAM for authentication.  If fprintd is 
stuck, the PAM call stalls until it gives up and moves on.

I tracked this down with strace.  The 25 seconds above is the timeout value 
passed to poll(2); that timeout is hit while sudo (via PAM) is trying to talk 
to dbus, since apparently fprintd communicates via dbus.

Because the problem is in PAM and not sudo, it also affected su.  It did not 
affect console or ssh user logins for some reason.

I’m posting these details in case this diagnosis helps someone.  While chasing 
this, I found a bunch of postings on the net from people who have run into such 
problems before, but none of the threads mentioned this particular failure 
mode.  For all I know, I’m the first person it’s ever happened to, so this 
needs to be in the archives.
_______________________________________________
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos

Reply via email to