Re: [Open-scap] RPM based remediation hangs on SUSE.

2016-04-01 Thread Šimon Lukašík
On 04/01/2016 05:07 PM, Šimon Lukašík wrote:
> Hello Gautam,
> 
> Firstly, let me just express gratitude for in depth analysis. I applaud
> to engineering virtue. Well done!
> 
> I would approach this thing as follows. I would connect with RPM
> maintainers in SuSE, why it needs to get exclusive lock.
> 
> I am not equipped to claim which solution (rpm in RHEL, vs. rpm in SuSE)
> is "better". However, having the lesser constraint seems better until
> there is risk for data corruption. I have been QE of RPM in RHEL many
> years ago, and I don't remember any issues with weak locks.
> 
> 
> Then, for OpenSCAP, we run the probe any time it is needed and we stop
> it only at the scan completion. That has performance benefits (I still
> have to see scanner that performs quicker than OpenSCAP :)). And then
> each probe holds cache of things that have been already queried. So,
> killing the probe is out of question in general.
> 
> However, in this specific case. We already drop all the cache when we do
> remediation (because cache would still return pre-remediated  values).
> So perhaps you can write code to stop all probes before remediation starts.
> 

Ah, this paragraph is void.

The killing is already done. :)

The problem is that when we decide whether the specific  element is
applicable, we run the rpminfo probe again.
https://github.com/OpenSCAP/openscap/blob/78c8706d961270f1878d0639bbceee3f3fb7623f/src/XCCDF_POLICY/xccdf_policy_remediate.c#L135


So, perhaps you can create SuSE specific patch to rpminfo probe to
return the lock after each request?

Best,
~š.

> The clean of the cache is here
> https://github.com/OpenSCAP/openscap/blob/78c8706d961270f1878d0639bbceee3f3fb7623f/src/XCCDF/xccdf_session.c#L1413
> 
> Good luck!
> ~š.
> 
> 
> On 03/31/2016 08:32 AM, S, Gautam wrote:
>> Hello folks,
>>
>>  
>>
>> I am looking at a rather obscure issue related to the way the RPM
>> library behaves on SUSE. While trying to run remediation for a rule
>> related to RPMs, the process hangs until it finally times out.
>>
>>  
>>
>> # oscap xccdf eval --profile test --remediate sles11-xccdf.xml
>>
>> Title   Uninstall bind Package
>>
>> Rulepackage_bind_removed
>>
>> Ident   CCE-27030-6
>>
>> Result  fail
>>
>>  
>>
>>  
>>
>> --- Starting Remediation ---
>>
>>  
>>
>> Title   Uninstall bind Package
>>
>> Rulepackage_bind_removed
>>
>> Ident   CCE-27030-6
>>
>>  
>>
>> <>
>>
>>  
>>
>> Result  error
>>
>>  
>>
>> I have collected openscap verbose logs and RPM logs and it looks like
>> there is a deadlock.
>>
>>  
>>
>> From RPM verbose mode logs while the fix is running the “rpm –e” operation:
>>
>>  
>>
>> D: opening  db environment /var/lib/rpm/Packages create:cdb:mpool:private
>>
>> D: opening  db index   /var/lib/rpm/Packages create mode=0x42
>>
>> warning: waiting for exclusive lock on /var/lib/rpm/Packages
>>
>> error: cannot get exclusive lock on /var/lib/rpm/Packages
>>
>> D: closed   db index   /var/lib/rpm/Packages
>>
>> D: closed   db environment /var/lib/rpm/Packages
>>
>> error: cannot open Packages index using db3 - Operation not permitted (1)
>>
>>  
>>
>> Using strace, we can see that it attempts to acquire this WR lock
>> multiple times until it finally times out and returns failure:
>>
>>  
>>
>> fcntl(3, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = -1
>> EAGAIN (Resource temporarily unavailable)
>>
>>  
>>
>> RPM does not get exclusive mode lock because Openscap seems to be
>> acquiring the lock in read mode:
>>
>>  
>>
>> # lsof /var/lib/rpm/Packages [Command run while the hang is seen]
>>
>> COMMAND PID USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
>>
>> probe_rpm 16875 root3rR  REG8,2 53420032 352263
>> /var/lib/rpm/Packages
>>
>> rpm   16891 root3u   REG8,2 53420032 352263
>> /var/lib/rpm/Packages
>>
>>  
>>
>> # ps –a [Command run while the hang is seen]
>>
>>   PID TTY  TIME CMD
>>
>> 16859 pts/000:00:00 oscap
>>
>> 16865 pts/000:00:00 probe_system_in
>>
>> 16870 pts/000:00:00 probe_family
>>
>> 16875 pts/000:00:00 probe_rpminfo
>>
>> 16885 pts/000:00:00 probe_system_in
>>
>> 16890 pts/000:00:00 bash
>>
>> 16891 pts/000:00:00 rpm
>>
>> 16934 pts/200:00:00 ps
>>
>> #
>>
>>  
>>
>> The RPM library call made in function
>> src/OVAL/probes/unix/linux/rpminfo.c:probe_init, rpmtsCreate() and the
>> subsequent rpmtsInitIterator() result in a read lock being taken on the
>> file /var/lib/rpm/Packages.
>>
>> This read lock seems to be released when rpmtsFree() call is made from
>> probe_fini.
>>
>>  
>>
>> The particular probe_rpminfo in question is not related to the OVAL
>> check of the rule but related to the CPE platform check.
>>
>>  
>>
>> From the additional traces I added to openscap:
>>
>> D: probe_rpminfo: ("seap.msg" ":id" 0 (("rpminfo_object" ":id"
>> "oval:org.open-scap.cpe.sles-release:obj:1" ":oval_version" "5.10.1" )
>> (("name" ":operation" 5 ":var_check" 1 ) "sles-release" ) ) )
>> 

Re: [Open-scap] RPM based remediation hangs on SUSE.

2016-04-01 Thread Šimon Lukašík
Hello Gautam,

Firstly, let me just express gratitude for in depth analysis. I applaud
to engineering virtue. Well done!

I would approach this thing as follows. I would connect with RPM
maintainers in SuSE, why it needs to get exclusive lock.

I am not equipped to claim which solution (rpm in RHEL, vs. rpm in SuSE)
is "better". However, having the lesser constraint seems better until
there is risk for data corruption. I have been QE of RPM in RHEL many
years ago, and I don't remember any issues with weak locks.


Then, for OpenSCAP, we run the probe any time it is needed and we stop
it only at the scan completion. That has performance benefits (I still
have to see scanner that performs quicker than OpenSCAP :)). And then
each probe holds cache of things that have been already queried. So,
killing the probe is out of question in general.

However, in this specific case. We already drop all the cache when we do
remediation (because cache would still return pre-remediated  values).
So perhaps you can write code to stop all probes before remediation starts.

The clean of the cache is here
https://github.com/OpenSCAP/openscap/blob/78c8706d961270f1878d0639bbceee3f3fb7623f/src/XCCDF/xccdf_session.c#L1413

Good luck!
~š.


On 03/31/2016 08:32 AM, S, Gautam wrote:
> Hello folks,
> 
>  
> 
> I am looking at a rather obscure issue related to the way the RPM
> library behaves on SUSE. While trying to run remediation for a rule
> related to RPMs, the process hangs until it finally times out.
> 
>  
> 
> # oscap xccdf eval --profile test --remediate sles11-xccdf.xml
> 
> Title   Uninstall bind Package
> 
> Rulepackage_bind_removed
> 
> Ident   CCE-27030-6
> 
> Result  fail
> 
>  
> 
>  
> 
> --- Starting Remediation ---
> 
>  
> 
> Title   Uninstall bind Package
> 
> Rulepackage_bind_removed
> 
> Ident   CCE-27030-6
> 
>  
> 
> <>
> 
>  
> 
> Result  error
> 
>  
> 
> I have collected openscap verbose logs and RPM logs and it looks like
> there is a deadlock.
> 
>  
> 
> From RPM verbose mode logs while the fix is running the “rpm –e” operation:
> 
>  
> 
> D: opening  db environment /var/lib/rpm/Packages create:cdb:mpool:private
> 
> D: opening  db index   /var/lib/rpm/Packages create mode=0x42
> 
> warning: waiting for exclusive lock on /var/lib/rpm/Packages
> 
> error: cannot get exclusive lock on /var/lib/rpm/Packages
> 
> D: closed   db index   /var/lib/rpm/Packages
> 
> D: closed   db environment /var/lib/rpm/Packages
> 
> error: cannot open Packages index using db3 - Operation not permitted (1)
> 
>  
> 
> Using strace, we can see that it attempts to acquire this WR lock
> multiple times until it finally times out and returns failure:
> 
>  
> 
> fcntl(3, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = -1
> EAGAIN (Resource temporarily unavailable)
> 
>  
> 
> RPM does not get exclusive mode lock because Openscap seems to be
> acquiring the lock in read mode:
> 
>  
> 
> # lsof /var/lib/rpm/Packages [Command run while the hang is seen]
> 
> COMMAND PID USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
> 
> probe_rpm 16875 root3rR  REG8,2 53420032 352263
> /var/lib/rpm/Packages
> 
> rpm   16891 root3u   REG8,2 53420032 352263
> /var/lib/rpm/Packages
> 
>  
> 
> # ps –a [Command run while the hang is seen]
> 
>   PID TTY  TIME CMD
> 
> 16859 pts/000:00:00 oscap
> 
> 16865 pts/000:00:00 probe_system_in
> 
> 16870 pts/000:00:00 probe_family
> 
> 16875 pts/000:00:00 probe_rpminfo
> 
> 16885 pts/000:00:00 probe_system_in
> 
> 16890 pts/000:00:00 bash
> 
> 16891 pts/000:00:00 rpm
> 
> 16934 pts/200:00:00 ps
> 
> #
> 
>  
> 
> The RPM library call made in function
> src/OVAL/probes/unix/linux/rpminfo.c:probe_init, rpmtsCreate() and the
> subsequent rpmtsInitIterator() result in a read lock being taken on the
> file /var/lib/rpm/Packages.
> 
> This read lock seems to be released when rpmtsFree() call is made from
> probe_fini.
> 
>  
> 
> The particular probe_rpminfo in question is not related to the OVAL
> check of the rule but related to the CPE platform check.
> 
>  
> 
> From the additional traces I added to openscap:
> 
> D: probe_rpminfo: ("seap.msg" ":id" 0 (("rpminfo_object" ":id"
> "oval:org.open-scap.cpe.sles-release:obj:1" ":oval_version" "5.10.1" )
> (("name" ":operation" 5 ":var_check" 1 ) "sles-release" ) ) )
> [probe_rpminfo(7045):Thream
> Name(7f901b36c700):seap-packet.c:904:SEAP_packet_recv]
> 
> …
> 
> I: probe_rpminfo: gautam : probe_init: Init 1 rpmts
> [probe_rpminfo(7050):Thream Name(7f45fcd527c0):rpminfo.c:325:probe_init]
> 
> <>
> 
> I: probe_rpminfo: gautam : probe_fini: Free 1 rpmts
> [probe_rpminfo(7045):Thream Name(7f90215a37c0):rpminfo.c:343:probe_fini]
> 
>  
> 
> Probe_fini is called in this thread only upon receiving SIGTERM.
> 
>  
> 
> *_Why does this work on RHEL?_*
> 
> On RHEL, the implementation of the RPM command does not seem to be
> trying to acquire an exclusive mode lock, it also locks in