On 01/07/2013 03:09 PM, Rob Crittenden wrote:
Petr Viktorin wrote:
On 01/06/2013 09:00 PM, Rob Crittenden wrote:
Each of the CA subsystem certificates would trigger a restart during
renewal. This generally caused one or more of the renewals to fail due
to the CA being down.

We also need to fix the trust on the audit cert post-installation. It
was possible that both certmonger and certutil could have the NSS
database open read/write which is almost guaranteed to result in
corruption.

So intead I picked the audit cert as the "lead" cert. It will handle
restarting the CA.

It will also wait until all the other CA subsystem certs are in a
MONITORING state before trying to update the trust. This should prevent
the multiple read/write problem.

The CA wasn't actually working post-renewal anyway because the user it
uses to bind to DS wasn't being updated properly. certmap.conf is
confiugred to compare the cert provided by the client with that stored
in LDAP and since we weren't updating it, dogtag couldn't properly bind
to its own DS instance.

We also update a ou=People entry for the RA agent cert so I pulled that
updating code into cainstance.py for easier sharing.

Finally, the wrong service name was being used for tomcat to do the
restart. This is fixed. I've tested this with 3.1/dogtag 10 but it
should work with dogtag 9 as well (which uses a different service naming
convention).

This is how I test:

- ipa-server-install ...
- getcert list | grep expires
- examine the first four certs, pick an expiration date ~28 days prior
- date MMDDhhmmCCYY
- getcert list|grep status

Wait until all but one is in MONITORING. That last one should be the
audit cert.

I usually at this point switch to watching a tail of /var/log/messages
until the CA restarts.

Confirm that things are working with:

- ipa cert-show 1

To really be sure, use the ipa cert-request command to issue a new cert.

Ideally you'll verify that things are working, then trigger another
renewal event. Do the getcert list|grep expires to renew the HTTP/DS
server certs, then do this again for the CA subsystem certs.

It should come up again.

rob


Works for me, but I have some questions (this is an area I know little
about).

Can we be 100% sure these certs are always renewed together? Is
certmonger the only possible mechanism to update them?

You raise a good point. If though some mechanism someone replaces one of
these certs it will cause the script to fail. Some notification of this
failure will be logged though, and of course, the certs won't be renewed.

One could conceivably manually renew one of these certificates. It is
probably a very remote possibility but it is non-zero.

Can we be sure certmonger always does the updates in parallel? If it
managed to update the audit cert before starting on the others, we'd get
no CA restart for the others.

These all get issued at the same time so should expire at the same time
as well (see problem above). The script will hang around for 10 minutes
waiting for the renewal to complete, then give up.

The certs might take different amounts of time to update, right? Eventually, the expirations could go out of sync enough for it to matter. AFAICS, without proper locking we still get a race condition when the other certs start being renewed some time (much less than 10 min) after the audit one:

(time axis goes down)

        audit cert                  other cert
        ----------                  ----------
    certmonger does renew                .
  post-renew script starts               .
 check state of other certs: OK          .
            .                   certmonger starts renew
 certutil modifies NSS DB  +  certmonger modifies NSS DB  == boom!


The state the system would be in is this:

- audit cert trust not updated, so next restart of CA will fail
- CA is not restarted so will not use updated certificates

And anyway, why does certmonger do renewals in parallel? It seems that
if it did one at a time, always waiting until the post-renew script is
done, this patch wouldn't be necessary.


 From what Nalin told me certmonger has some coarse locking such that
renewals in a the same NSS database are serialized. As you point out, it
would be nice to extend this locking to the post renewal scripts. We can
ask Nalin about it. That would fix the potential corruption issue. It is
still much nicer to not have to restart dogtag 4 times.


Well, three extra restarts every few years seems like a small price to pay for robustness.


--
PetrĀ³

_______________________________________________
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel

Reply via email to