> Is that problem related to the Native POSIX Thread Library
> issues that are describe in the 8.0.94/RELEASE-NOTES file?
> If so, that doc says that the workaround is to either set
> "LD_ASSUME_KERNEL=2.2.5" or boot with the option "nosysinfo"
I'll try that out as a workaround. Thanks (and I should have read that in
the first place after switching up from earlier 8.0.9x versions).
> We've found one problem with rpm and SIGPIPE. If you do something
> like "rpm -qa | /bin/true" as root, you'll get a stale lock. You'll
> also get stale locks any time you use SIGKILL or any other
Ok, I'll check that. If this is the culprit, then its likely that the
problems I was seeing yesterday come from using rpm as part of shell
scripts and having the output feed other scripts.
> That doesn't prove it is a kernel bug, because rebooting also clears
> rpm's lock files.
Right, that was the wrong culprit. So, looking a little deeper into
this...
Looping over:
rpm -Uvh cpan2rpm-2.014-1.noarch.rpm
rpm -e cpan2rpm
[ side note: cpan2rpm is quite useful. ]
...appears go a hundred iterations without producing a hang.
The hang can be reproduced reliably with this set of commands:
1. reboot
2. log in as root
3. rpm -qa | /bin/true
4. rpm -e cpan2rpm # installed previously
...confirming Matt's message. Once the "rpm -qa | /bin/true" command has
been issued, successive "rpm -e" and "rpm -U" commands reliably hang.
The problem does not occur with this sequence:
1. reboot
2. log in as root
3. LD_ASSUME_KERNEL="2.2.5" rpm -qa | /bin/true
4. rpm -e cpan2rpm
5. rpm -Uvh cpan2rpm-2.014-1.noarch.rpm
...or with the sequence:
1. reboot
2. log in as root
3. rpm -qa | /bin/true
4. LD_ASSUME_KERNEL="2.2.5" rpm -e cpan2rpm
5. LD_ASSUME_KERNEL="2.2.5" rpm -Uvh cpan2rpm-2.014-1.noarch.rpm
...and prefixing all subsequent rpm commands with
LD_ASSUME_KERNEL="2.2.5".
The rpm command reliably stops hanging when you rm -rf /var/lib/rpm/__db.*
without a reboot, which is also done in rc.sysinit.
So the release notes (which I should have applied before ranting) are
appropriate to this instance. Good to know there's a workaround and that
this is probably far more mundane.
The waiting futex syscall in "rpm -e":
futex(0x4059130c, FUTEX_WAIT, 0, NULL <unfinished ...>
[EMAIL PROTECTED]:/proc/17240#grep '__db.' maps
40017000-4001b000 rw-s 00000000 08:11 229714 /var/lib/rpm/__db.001
40406000-40548000 rw-s 00000000 08:11 229715 /var/lib/rpm/__db.002
40548000-405b8000 rw-s 00000000 08:11 229716 /var/lib/rpm/__db.003
...is referencing a pointer in the address range of the mmapped file
"__db.003". Rpm's usage of shared regions is interesting code reading
that doesn't need to be rehashed here. The hang is clearly a case of
waiting for a mutex lock on a structure in a file which needs to be
cleared as part of a lock reclamation step or rpm needs to be able to bail
out earlier and alert the user that there's a problem.
Thanks for the support.
Cheers,
Rob
--
Phoebe-list mailing list
[EMAIL PROTECTED]
https://listman.redhat.com/mailman/listinfo/phoebe-list