Re: Let's never, ever, shut down NSS -- even in debug builds

Brian Smith Tue, 29 Jan 2013 10:15:09 -0800

Benjamin Smedberg wrote:
> On 1/28/2013 6:39 PM, Brian Smith wrote:
> > This will greatly simplify lots of code--not just code in
> > security/manager, but also code in WebRTC, DOM (DOMCrypt), Toolkit
> > (toolkit/identity), and Necko (netwerk/). We already have a
> > significant amount of code that is already running in Firefox
> > every day, but which doesn't handle NSS shutdown correctly.
>
> Could you elaborate on the kinds of things that client code currently
> shouldn't do?


Basically, you cannot call any NSS function without (a) having ensured that You 
have initialized the "psm;1" component, and (b) having acquired the 
nsNSSShutdownPreventionLock, and (c) checking that NSS hasn't already been shut 
down. I am not trying to solve any problems related to (a), as that's generally 
handled correctly.

Looking at the code, I have seen the following problematic patterns:

1. Implementations of nsNSSShutdownObject that call NSS functions in their 
destructor without acquiring the nsNSSShutdownPreventionLock--basically, every 
implementation except one.
2. Quite a few PSM functions that acquire the nsNSSShutdownPreventionLock and 
then fail to check if NSS has been shut down before calling any NSS functions.
3. Code that is not destroying all its references to NSS objects during 
shutdown (i.e. not implementing nsNSSShutdownObject / CryptoTask correctly), or 
which is calling into NSS completely oblivious to NSS shutdown.

> Is this related to network activity, or something else?
> As far as I understand things, the network gets shut down before we
> get to NSS shutdown, so there really shouldn't be any clients calling
> into PSM via the network by the time we get to that point anyway.

Should we be requiring networking to be shut down at all? Or, should networking 
be made oblivious to shutdown (besides putting the HTTP cache in a read-only 
state, perhaps) and run all the way up to exit(0)?

Many (most?) of the crashes we've seen during NSS shutdown clearly show network 
activity happening concurrently on another thread. If we really need to make 
sure networking is shut down cleanly before we exit(0), then we can fix the 
networking code to not do that. That's actually what prompted this 
investigation in the first place. But, if the only reason we want networking to 
shut down cleanly is so that it can handle NSS shutdown properly, then I'd 
rather just avoid the issue entirely by keeping NSS available all the time and 
avoid the issue.

> If there are other APIs (cert management or whatnot) that are getting
> called late, wouldn't it be ok to just make them fail?

That's what's already supposed to happen. The problem is that many of the very 
many implementations of the "just make them fail" is missing or incorrect.

> This is indeed a concern. We have already given up on leak reporting
> in release builds, but we do still track leaks from debug builds and
> even back out patches that cause leaks. I would be concerned about
> patches that make it impossible to track leaks.

I share that concern. On the other hand, having perfect leak detection means 
writing and maintaining a *lot* of otherwise-unnecessary code.

> > 2. Because NSS reads and writes to files in the profile directory,
> > the profile directory must be readable and writable up until
> > process exit. The current rules for XPCOM shutdown say that
> > services must stop doing disk I/O well before then; we would need
> > to change the rules. It is safe to do so?
> Why is NSS doing this? We should be done with activities like
> certificate management well before... can we ask NSS to sync the
> important data to disk and then stop touching the disk? If NSS is
> actually touching the disk at process exit, doesn't that leave us in
> a possibly-inconsistent state?

One example: we write certificates that we received from NSS to the NSS 
certificate database as we parse them from the SSL handshake. So, as long as it 
is possible for an SSL handshake to occur, we will write to the database. Also, 
like Bob noted, the dbm-based NSS database we use seems to like to write to the 
database at shutdown just for fun; this is something that could be fixed.

Obviously we need to make sure that we don't corrupt the NSS databases at 
shutdown by exiting in the middle of a write. It is something that will need to 
be investigated going forward. Presumably switching to the SQLite-based 
database would solve this problem.

Cheers,
Brian
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Let's never, ever, shut down NSS -- even in debug builds

Reply via email to