Re: [Open-scap] RPM based remediation hangs on SUSE.
On 04/01/2016 05:07 PM, Šimon Lukašík wrote: > Hello Gautam, > > Firstly, let me just express gratitude for in depth analysis. I applaud > to engineering virtue. Well done! > > I would approach this thing as follows. I would connect with RPM > maintainers in SuSE, why it needs to get exclusive lock. > > I am not equipped to claim which solution (rpm in RHEL, vs. rpm in SuSE) > is "better". However, having the lesser constraint seems better until > there is risk for data corruption. I have been QE of RPM in RHEL many > years ago, and I don't remember any issues with weak locks. > > > Then, for OpenSCAP, we run the probe any time it is needed and we stop > it only at the scan completion. That has performance benefits (I still > have to see scanner that performs quicker than OpenSCAP :)). And then > each probe holds cache of things that have been already queried. So, > killing the probe is out of question in general. > > However, in this specific case. We already drop all the cache when we do > remediation (because cache would still return pre-remediated values). > So perhaps you can write code to stop all probes before remediation starts. > Ah, this paragraph is void. The killing is already done. :) The problem is that when we decide whether the specific element is applicable, we run the rpminfo probe again. https://github.com/OpenSCAP/openscap/blob/78c8706d961270f1878d0639bbceee3f3fb7623f/src/XCCDF_POLICY/xccdf_policy_remediate.c#L135 So, perhaps you can create SuSE specific patch to rpminfo probe to return the lock after each request? Best, ~š. > The clean of the cache is here > https://github.com/OpenSCAP/openscap/blob/78c8706d961270f1878d0639bbceee3f3fb7623f/src/XCCDF/xccdf_session.c#L1413 > > Good luck! > ~š. > > > On 03/31/2016 08:32 AM, S, Gautam wrote: >> Hello folks, >> >> >> >> I am looking at a rather obscure issue related to the way the RPM >> library behaves on SUSE. While trying to run remediation for a rule >> related to RPMs, the process hangs until it finally times out. >> >> >> >> # oscap xccdf eval --profile test --remediate sles11-xccdf.xml >> >> Title Uninstall bind Package >> >> Rulepackage_bind_removed >> >> Ident CCE-27030-6 >> >> Result fail >> >> >> >> >> >> --- Starting Remediation --- >> >> >> >> Title Uninstall bind Package >> >> Rulepackage_bind_removed >> >> Ident CCE-27030-6 >> >> >> >> <> >> >> >> >> Result error >> >> >> >> I have collected openscap verbose logs and RPM logs and it looks like >> there is a deadlock. >> >> >> >> From RPM verbose mode logs while the fix is running the “rpm –e” operation: >> >> >> >> D: opening db environment /var/lib/rpm/Packages create:cdb:mpool:private >> >> D: opening db index /var/lib/rpm/Packages create mode=0x42 >> >> warning: waiting for exclusive lock on /var/lib/rpm/Packages >> >> error: cannot get exclusive lock on /var/lib/rpm/Packages >> >> D: closed db index /var/lib/rpm/Packages >> >> D: closed db environment /var/lib/rpm/Packages >> >> error: cannot open Packages index using db3 - Operation not permitted (1) >> >> >> >> Using strace, we can see that it attempts to acquire this WR lock >> multiple times until it finally times out and returns failure: >> >> >> >> fcntl(3, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = -1 >> EAGAIN (Resource temporarily unavailable) >> >> >> >> RPM does not get exclusive mode lock because Openscap seems to be >> acquiring the lock in read mode: >> >> >> >> # lsof /var/lib/rpm/Packages [Command run while the hang is seen] >> >> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME >> >> probe_rpm 16875 root3rR REG8,2 53420032 352263 >> /var/lib/rpm/Packages >> >> rpm 16891 root3u REG8,2 53420032 352263 >> /var/lib/rpm/Packages >> >> >> >> # ps –a [Command run while the hang is seen] >> >> PID TTY TIME CMD >> >> 16859 pts/000:00:00 oscap >> >> 16865 pts/000:00:00 probe_system_in >> >> 16870 pts/000:00:00 probe_family >> >> 16875 pts/000:00:00 probe_rpminfo >> >> 16885 pts/000:00:00 probe_system_in >> >> 16890 pts/000:00:00 bash >> >> 16891 pts/000:00:00 rpm >> >> 16934 pts/200:00:00 ps >> >> # >> >> >> >> The RPM library call made in function >> src/OVAL/probes/unix/linux/rpminfo.c:probe_init, rpmtsCreate() and the >> subsequent rpmtsInitIterator() result in a read lock being taken on the >> file /var/lib/rpm/Packages. >> >> This read lock seems to be released when rpmtsFree() call is made from >> probe_fini. >> >> >> >> The particular probe_rpminfo in question is not related to the OVAL >> check of the rule but related to the CPE platform check. >> >> >> >> From the additional traces I added to openscap: >> >> D: probe_rpminfo: ("seap.msg" ":id" 0 (("rpminfo_object" ":id" >> "oval:org.open-scap.cpe.sles-release:obj:1" ":oval_version" "5.10.1" ) >> (("name" ":operation" 5 ":var_check" 1 ) "sles-release" ) ) ) >>
Re: [Open-scap] RPM based remediation hangs on SUSE.
Hello Gautam, Firstly, let me just express gratitude for in depth analysis. I applaud to engineering virtue. Well done! I would approach this thing as follows. I would connect with RPM maintainers in SuSE, why it needs to get exclusive lock. I am not equipped to claim which solution (rpm in RHEL, vs. rpm in SuSE) is "better". However, having the lesser constraint seems better until there is risk for data corruption. I have been QE of RPM in RHEL many years ago, and I don't remember any issues with weak locks. Then, for OpenSCAP, we run the probe any time it is needed and we stop it only at the scan completion. That has performance benefits (I still have to see scanner that performs quicker than OpenSCAP :)). And then each probe holds cache of things that have been already queried. So, killing the probe is out of question in general. However, in this specific case. We already drop all the cache when we do remediation (because cache would still return pre-remediated values). So perhaps you can write code to stop all probes before remediation starts. The clean of the cache is here https://github.com/OpenSCAP/openscap/blob/78c8706d961270f1878d0639bbceee3f3fb7623f/src/XCCDF/xccdf_session.c#L1413 Good luck! ~š. On 03/31/2016 08:32 AM, S, Gautam wrote: > Hello folks, > > > > I am looking at a rather obscure issue related to the way the RPM > library behaves on SUSE. While trying to run remediation for a rule > related to RPMs, the process hangs until it finally times out. > > > > # oscap xccdf eval --profile test --remediate sles11-xccdf.xml > > Title Uninstall bind Package > > Rulepackage_bind_removed > > Ident CCE-27030-6 > > Result fail > > > > > > --- Starting Remediation --- > > > > Title Uninstall bind Package > > Rulepackage_bind_removed > > Ident CCE-27030-6 > > > > <> > > > > Result error > > > > I have collected openscap verbose logs and RPM logs and it looks like > there is a deadlock. > > > > From RPM verbose mode logs while the fix is running the “rpm –e” operation: > > > > D: opening db environment /var/lib/rpm/Packages create:cdb:mpool:private > > D: opening db index /var/lib/rpm/Packages create mode=0x42 > > warning: waiting for exclusive lock on /var/lib/rpm/Packages > > error: cannot get exclusive lock on /var/lib/rpm/Packages > > D: closed db index /var/lib/rpm/Packages > > D: closed db environment /var/lib/rpm/Packages > > error: cannot open Packages index using db3 - Operation not permitted (1) > > > > Using strace, we can see that it attempts to acquire this WR lock > multiple times until it finally times out and returns failure: > > > > fcntl(3, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = -1 > EAGAIN (Resource temporarily unavailable) > > > > RPM does not get exclusive mode lock because Openscap seems to be > acquiring the lock in read mode: > > > > # lsof /var/lib/rpm/Packages [Command run while the hang is seen] > > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > > probe_rpm 16875 root3rR REG8,2 53420032 352263 > /var/lib/rpm/Packages > > rpm 16891 root3u REG8,2 53420032 352263 > /var/lib/rpm/Packages > > > > # ps –a [Command run while the hang is seen] > > PID TTY TIME CMD > > 16859 pts/000:00:00 oscap > > 16865 pts/000:00:00 probe_system_in > > 16870 pts/000:00:00 probe_family > > 16875 pts/000:00:00 probe_rpminfo > > 16885 pts/000:00:00 probe_system_in > > 16890 pts/000:00:00 bash > > 16891 pts/000:00:00 rpm > > 16934 pts/200:00:00 ps > > # > > > > The RPM library call made in function > src/OVAL/probes/unix/linux/rpminfo.c:probe_init, rpmtsCreate() and the > subsequent rpmtsInitIterator() result in a read lock being taken on the > file /var/lib/rpm/Packages. > > This read lock seems to be released when rpmtsFree() call is made from > probe_fini. > > > > The particular probe_rpminfo in question is not related to the OVAL > check of the rule but related to the CPE platform check. > > > > From the additional traces I added to openscap: > > D: probe_rpminfo: ("seap.msg" ":id" 0 (("rpminfo_object" ":id" > "oval:org.open-scap.cpe.sles-release:obj:1" ":oval_version" "5.10.1" ) > (("name" ":operation" 5 ":var_check" 1 ) "sles-release" ) ) ) > [probe_rpminfo(7045):Thream > Name(7f901b36c700):seap-packet.c:904:SEAP_packet_recv] > > … > > I: probe_rpminfo: gautam : probe_init: Init 1 rpmts > [probe_rpminfo(7050):Thream Name(7f45fcd527c0):rpminfo.c:325:probe_init] > > <> > > I: probe_rpminfo: gautam : probe_fini: Free 1 rpmts > [probe_rpminfo(7045):Thream Name(7f90215a37c0):rpminfo.c:343:probe_fini] > > > > Probe_fini is called in this thread only upon receiving SIGTERM. > > > > *_Why does this work on RHEL?_* > > On RHEL, the implementation of the RPM command does not seem to be > trying to acquire an exclusive mode lock, it also locks in