Hello Gautam, Firstly, let me just express gratitude for in depth analysis. I applaud to engineering virtue. Well done!
I would approach this thing as follows. I would connect with RPM maintainers in SuSE, why it needs to get exclusive lock. I am not equipped to claim which solution (rpm in RHEL, vs. rpm in SuSE) is "better". However, having the lesser constraint seems better until there is risk for data corruption. I have been QE of RPM in RHEL many years ago, and I don't remember any issues with weak locks. Then, for OpenSCAP, we run the probe any time it is needed and we stop it only at the scan completion. That has performance benefits (I still have to see scanner that performs quicker than OpenSCAP :)). And then each probe holds cache of things that have been already queried. So, killing the probe is out of question in general. However, in this specific case. We already drop all the cache when we do remediation (because cache would still return pre-remediated values). So perhaps you can write code to stop all probes before remediation starts. The clean of the cache is here https://github.com/OpenSCAP/openscap/blob/78c8706d961270f1878d0639bbceee3f3fb7623f/src/XCCDF/xccdf_session.c#L1413 Good luck! ~š. On 03/31/2016 08:32 AM, S, Gautam wrote: > Hello folks, > > > > I am looking at a rather obscure issue related to the way the RPM > library behaves on SUSE. While trying to run remediation for a rule > related to RPMs, the process hangs until it finally times out. > > > > # oscap xccdf eval --profile test --remediate sles11-xccdf.xml > > Title Uninstall bind Package > > Rule package_bind_removed > > Ident CCE-27030-6 > > Result fail > > > > > > --- Starting Remediation --- > > > > Title Uninstall bind Package > > Rule package_bind_removed > > Ident CCE-27030-6 > > > > <<Hangs for a while here>> > > > > Result error > > > > I have collected openscap verbose logs and RPM logs and it looks like > there is a deadlock. > > > > From RPM verbose mode logs while the fix is running the “rpm –e” operation: > > > > D: opening db environment /var/lib/rpm/Packages create:cdb:mpool:private > > D: opening db index /var/lib/rpm/Packages create mode=0x42 > > warning: waiting for exclusive lock on /var/lib/rpm/Packages > > error: cannot get exclusive lock on /var/lib/rpm/Packages > > D: closed db index /var/lib/rpm/Packages > > D: closed db environment /var/lib/rpm/Packages > > error: cannot open Packages index using db3 - Operation not permitted (1) > > > > Using strace, we can see that it attempts to acquire this WR lock > multiple times until it finally times out and returns failure: > > > > fcntl(3, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = -1 > EAGAIN (Resource temporarily unavailable) > > > > RPM does not get exclusive mode lock because Openscap seems to be > acquiring the lock in read mode: > > > > # lsof /var/lib/rpm/Packages [Command run while the hang is seen] > > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > > probe_rpm 16875 root 3rR REG 8,2 53420032 352263 > /var/lib/rpm/Packages > > rpm 16891 root 3u REG 8,2 53420032 352263 > /var/lib/rpm/Packages > > > > # ps –a [Command run while the hang is seen] > > PID TTY TIME CMD > > 16859 pts/0 00:00:00 oscap > > 16865 pts/0 00:00:00 probe_system_in > > 16870 pts/0 00:00:00 probe_family > > 16875 pts/0 00:00:00 probe_rpminfo > > 16885 pts/0 00:00:00 probe_system_in > > 16890 pts/0 00:00:00 bash > > 16891 pts/0 00:00:00 rpm > > 16934 pts/2 00:00:00 ps > > # > > > > The RPM library call made in function > src/OVAL/probes/unix/linux/rpminfo.c:probe_init, rpmtsCreate() and the > subsequent rpmtsInitIterator() result in a read lock being taken on the > file /var/lib/rpm/Packages. > > This read lock seems to be released when rpmtsFree() call is made from > probe_fini. > > > > The particular probe_rpminfo in question is not related to the OVAL > check of the rule but related to the CPE platform check. > > > > From the additional traces I added to openscap: > > D: probe_rpminfo: ("seap.msg" ":id" 0 (("rpminfo_object" ":id" > "oval:org.open-scap.cpe.sles-release:obj:1" ":oval_version" "5.10.1" ) > (("name" ":operation" 5 ":var_check" 1 ) "sles-release" ) ) ) > [probe_rpminfo(7045):Thream > Name(7f901b36c700):seap-packet.c:904:SEAP_packet_recv] > > … > > I: probe_rpminfo: gautam : probe_init: Init 1 rpmts > [probe_rpminfo(7050):Thream Name(7f45fcd527c0):rpminfo.c:325:probe_init] > > <<Fix runs between the rpmts create and free .i.e lock acquire and release>> > > I: probe_rpminfo: gautam : probe_fini: Free 1 rpmts > [probe_rpminfo(7045):Thream Name(7f90215a37c0):rpminfo.c:343:probe_fini] > > > > Probe_fini is called in this thread only upon receiving SIGTERM. > > > > *_Why does this work on RHEL?_* > > On RHEL, the implementation of the RPM command does not seem to be > trying to acquire an exclusive mode lock, it also locks in read mode only: > > > > D: opening db environment /var/lib/rpm cdb:mpool:joinenv > > D: opening db index /var/lib/rpm/Packages create mode=0x42 > > D: sanity checking 1 elements >> No > message about lock acquisition here! > > D: running pre-transaction scripts > > > > # lsof /var/lib/rpm/Packages > > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > > probe_rpm 33539 root 3rR REG 8,3 75472896 1048584 > /var/lib/rpm/Packages > > rpm 33555 root 3uR REG 8,3 75472896 1048584 > /var/lib/rpm/Packages > > > > Is there some design document that explains how openscap spawns the CPE > platform check? I am trying to understand if the probe can relinquish > the lock by freeing the RPMTs structure before executing the remediation > script. > > > > Thank you. > > > > Regards, > > Gautam. > > > > _______________________________________________ > Open-scap-list mailing list > Open-scap-list@redhat.com > https://www.redhat.com/mailman/listinfo/open-scap-list > ~š. _______________________________________________ Open-scap-list mailing list Open-scap-list@redhat.com https://www.redhat.com/mailman/listinfo/open-scap-list