On 5/13/25 12:34, Colin Percival wrote:
On 5/13/25 10:22, Pete Wright wrote:
So I've found an interesting pattern, the above messages get printed
to /var/ log/messages and the dmesg buffer when i "su" to root
apparently:
May 9 19:19:23 airflow-nfs su[66523]: ec2-user to root on /dev/pts/3
May 9 19:19:23 airflow-nfs kernel: Found a Tx that wasn't completed
on time, qid 2, index 593. 10 msecs have passed since last cleanup.
Missing Tx timeout value 5000 msecs.
May 9 19:19:23 airflow-nfs kernel: Found a Tx that wasn't completed
on time, qid 2, index 220. 1 msecs have passed since last cleanup.
Missing Tx timeout value 5000 msecs.
[...]
I have no idea what that means, but certainly feels like an
interesting data- point. i'm ssh'ing as the ec2-user, then "su -" to
become root and as you can see from the timestamps something triggers
those log events. i'm not seeing any other occurances of these log
messages outside of su'ing too. this is a very vanilla system, not
krb auth or other network interactions should happen when i become root.
Ooh, very interesting, and points to something I had wondered about
earlier.
There should be a line 'hw.broken_txfifo="1"' in /boot/loader.conf; can you
try removing that and see if the problem goes away? (In fact, it's a
sysctl
so you can flip it on and off without taking the system down.)
If the system reproducibly prints that warning with broken_txfifo=1 and
does
not print the warning with broken_txfifo=0, we have the culprit. And I can
just remove that from EC2 images; it's a workaround for an old emulation
bug
which *should* be long since fixed in all EC2 instance types.
oh interesting! cool i've toggled that sysctl knob:
# sysctl hw.broken_txfifo=0
hw.broken_txfifo: 1 -> 0
#
i did an initial test and it looks good so far, i'll let it soak for the
rest of the day today and check-in tomorrow. thanks Colin!
-pete
--
Pete Wright
[email protected]