Re: Reducing NSS's allocation rate

2014-11-12 Thread Nicholas Nethercote
On Tue, Nov 11, 2014 at 11:10 PM, Brian Smith br...@briansmith.org wrote:

 I've attached to the bug a profile that uses a stack trace depth of 10.

 Unfortunately, 10 isn't enough to see the non-NSS entry (one that
 doesn't start with security/nss/) for every case.

I've now attached to the bug a profile that uses a stack trace depth
of 24. That's the maximum I have in this particular profile, and
definitely gets out of security/nss/ :)

Nick
-- 
dev-tech-crypto mailing list
dev-tech-crypto@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-crypto


Re: Reducing NSS's allocation rate

2014-11-11 Thread Nicholas Nethercote
On Mon, Nov 10, 2014 at 7:06 PM, Ryan Sleevi
ryan-mozdevtechcry...@sleevi.com wrote:

 Not to be a pain and discourage someone from hacking on NSS

My patches are in the following bugs:

https://bugzilla.mozilla.org/show_bug.cgi?id=1094650
https://bugzilla.mozilla.org/show_bug.cgi?id=1095307
https://bugzilla.mozilla.org/show_bug.cgi?id=1096741

I'm happy to hear specific criticisms.

Nick
-- 
dev-tech-crypto mailing list
dev-tech-crypto@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-crypto


Re: Reducing NSS's allocation rate

2014-11-11 Thread Ryan Sleevi
On Tue, November 11, 2014 10:26 am, Nicholas Nethercote wrote:
  On Mon, Nov 10, 2014 at 7:06 PM, Ryan Sleevi
  ryan-mozdevtechcry...@sleevi.com wrote:
 
  Not to be a pain and discourage someone from hacking on NSS

  My patches are in the following bugs:

  https://bugzilla.mozilla.org/show_bug.cgi?id=1094650
  https://bugzilla.mozilla.org/show_bug.cgi?id=1095307
  https://bugzilla.mozilla.org/show_bug.cgi?id=1096741

  I'm happy to hear specific criticisms.

  Nick
  --

Not trying to be a pain, but I don't think that's fair to position like
that. I'd rather the first question be answered - each one of your patches
adds more branches and more complexity. That part is indisputable, even if
it may be seen as small. However, what measures do we have to ensure that
this is meaningfully improving any objective measure of performance (other
than allocation churn, which allocators are exceptionally capable of
handling)? How do we ensure this doesn't regress? Otherwise, we're adding
complexity without any benefits. And I don't think there are - or at
least, I haven't seen, other than reduces allocations.

-- 
dev-tech-crypto mailing list
dev-tech-crypto@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-crypto


Re: Reducing NSS's allocation rate

2014-11-11 Thread Robert Relyea

On 11/11/2014 12:32 PM, Ryan Sleevi wrote:

On Tue, November 11, 2014 10:26 am, Nicholas Nethercote wrote:

  On Mon, Nov 10, 2014 at 7:06 PM, Ryan Sleevi
  ryan-mozdevtechcry...@sleevi.com wrote:

Not to be a pain and discourage someone from hacking on NSS

  My patches are in the following bugs:

  https://bugzilla.mozilla.org/show_bug.cgi?id=1094650
  https://bugzilla.mozilla.org/show_bug.cgi?id=1095307
  https://bugzilla.mozilla.org/show_bug.cgi?id=1096741

  I'm happy to hear specific criticisms.

  Nick
  --

Not trying to be a pain, but I don't think that's fair to position like
that. I'd rather the first question be answered - each one of your patches
adds more branches and more complexity.


OK, looking at that patches, I would challege that assertion. The patch 
in  https://bugzilla.mozilla.org/show_bug.cgi?id=1094650 actually 
simplifies the code. I do agree we need to make complexity versus reward 
calculations, but most of these changes don't raise the complexity bar 
very high, if at all.

That part is indisputable, even if
it may be seen as small.

Sorry, I just disputed it;).

  However, what measures do we have to ensure that
this is meaningfully improving any objective measure of performance (other
than allocation churn, which allocators are exceptionally capable of
handling)?
I agree that these changes should be targeted based on instrumentation. 
I believe Nicholas has provided this in the bug.

  How do we ensure this doesn't regress?
I think that depends on the change.  Certainly 
https://bugzilla.mozilla.org/show_bug.cgi?id=1095307 is simple enough, 
that the combination of review and normal testing should give us 
confidence in the patch's correctness.



I'd say the same forhttps://bugzilla.mozilla.org/show_bug.cgi?id=1094650  .
The patch in bug  https://bugzilla.mozilla.org/show_bug.cgi?id=1096741  seems 
more risky, as it delays allocation. So it would require more testing and and a 
more dillegent review, and may actually cross the complexity/benefit bar for 
me. (also, the code in question is code meant to avoid cache attacks, we want 
to make sure we don't reintroduce a new back channel attack).


Otherwise, we're adding
complexity without any benefits. And I don't think there are - or at
least, I haven't seen, other than reduces allocations.
For me the bar is pretty low, particularly since I've seen patches that 
actually simplify the code.  The 'use static until some threshold, the 
malloc' is a pretty well established meme in NSS already, and is 
particularly safe if we can see it's entire use over a single function.


The bigger issue is are we getting proper bang for the buck.  Here we 
have an NSS user who is admittedly tuning for his particular workload, 
but doing performance tuning. It's targeted by hitting the largest 
offenders using instrumented tools to figure that out given the client's 
workload, then knocking out the low hanging fruit. I think that's pretty 
classic optimization strategy.


bob


smime.p7s
Description: S/MIME Cryptographic Signature
-- 
dev-tech-crypto mailing list
dev-tech-crypto@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-crypto

Re: Reducing NSS's allocation rate

2014-11-11 Thread Brian Smith
On Mon, Nov 10, 2014 at 9:04 PM, Nicholas Nethercote
n.netherc...@gmail.com wrote:
 In your analysis, it would be better to use a call stack trace depth
 larger than 5 that allows us to see what non-NSS function is calling
 into NSS.

 I've attached to the bug a profile that uses a stack trace depth of 10.

Unfortunately, 10 isn't enough to see the non-NSS entry (one that
doesn't start with security/nss/) for every case. However, it looks
like the data supports the types of changes that you are making and
also my suggestions for coalescing and caching results, as well as my
suggestion to avoid constructing CERTCertificate objects. Depending on
how much effort you're willing to invest in this, many (probably most)
of those allocations can be avoided. David Keeler is very familiar
with the code in security/certverifier and security/manager/ssl/src
that would be changed to implement the additional things I suggested,
so I suggest you talk to him about it.

Cheers,
Brian
-- 
dev-tech-crypto mailing list
dev-tech-crypto@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-crypto


Reducing NSS's allocation rate

2014-11-10 Thread Nicholas Nethercote
Hi,

I've been doing some heap allocation profiling and found that during
basic usage NSS accounts for 1/3 of all of Firefox's cumulative (*not*
live) heap allocations. We're talking gigabytes of allocations in
short browsing sessions. That is *insane*.

I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1095272 about
this. I've written several patches that fix problems, one of which has
r+ and is awaiting checkin; check the dependent bugs.

This is making Ryan Sleevi is nervous and he wanted me to post
something here about my plans. So here they are: I want to reduce
unnecessary allocations. I want to do so in a very non-intrusive
fashion: I'm aware that NSS is security-sensitive code, and TBH it's
not that enjoyable to read or modify. I will plug away at this as long
as there is low-hanging fruit to be found, which may not be that much
longer.

Nick
-- 
dev-tech-crypto mailing list
dev-tech-crypto@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-crypto


Re: Reducing NSS's allocation rate

2014-11-10 Thread Ryan Sleevi
On Mon, November 10, 2014 6:51 pm, Nicholas Nethercote wrote:
  Hi,

  I've been doing some heap allocation profiling and found that during
  basic usage NSS accounts for 1/3 of all of Firefox's cumulative (*not*
  live) heap allocations. We're talking gigabytes of allocations in
  short browsing sessions. That is *insane*.

Could you explain why it's insane? I guess that's sort of why I poked you
on the bug. Plenty of allocators are rather smart under such churn, and
NSS itself uses an Arena allocator designed to re-use some (but
understandably not all) allocations.

Is there a set of performance criteria you're measuring? For example,
we're spending X% of CPU in the allocator, and we believe we can reduce it
to Y%, which will improve test Z.

Not to be a pain and discourage someone from hacking on NSS (we always
need more NSS hackers), but I guess I'm just trying to understand the
complexity (and locally caching / lazy instantiating always adds some
degree of complexity, though hopefully minor) vs performance (which is
hopefully being measured) tradeoffs. Further, if we aren't continuously
monitoring this, meaning it's not a metric integrated as something we
watch, then it seems very easy that any improvements you make can quickly
regress, which would be unfortunate for your work.

-- 
dev-tech-crypto mailing list
dev-tech-crypto@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-crypto


Re: Reducing NSS's allocation rate

2014-11-10 Thread Julien Pierre
Personally, I would like to encourage your efforts. If you are able to 
move many of these allocations from heap-based with locks, to something 
stack-based instead, this will improve NSS server performance 
tremendously. I would be surprised if it was a significant boost to 
client apps like Firefox, though.


On the other hand, I'm not sure that there is that much low-hanging 
fruit, based on the stacks you list in the bug.
Many are dictated by the design of NSS, the PKCS#11 API, and the current 
softoken implementation.
Working within these constraints is not so simple. Keep in mind that 
many things you might want to change cannot be, in order to preserve NSS 
API binary compatibility.


Julien

On 11/10/2014 18:51, Nicholas Nethercote wrote:

Hi,

I've been doing some heap allocation profiling and found that during
basic usage NSS accounts for 1/3 of all of Firefox's cumulative (*not*
live) heap allocations. We're talking gigabytes of allocations in
short browsing sessions. That is *insane*.

I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1095272 about
this. I've written several patches that fix problems, one of which has
r+ and is awaiting checkin; check the dependent bugs.

This is making Ryan Sleevi is nervous and he wanted me to post
something here about my plans. So here they are: I want to reduce
unnecessary allocations. I want to do so in a very non-intrusive
fashion: I'm aware that NSS is security-sensitive code, and TBH it's
not that enjoyable to read or modify. I will plug away at this as long
as there is low-hanging fruit to be found, which may not be that much
longer.

Nick


--
dev-tech-crypto mailing list
dev-tech-crypto@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-crypto


Re: Reducing NSS's allocation rate

2014-11-10 Thread Brian Smith
On Mon, Nov 10, 2014 at 6:51 PM, Nicholas Nethercote
n.netherc...@gmail.com wrote:
 I've been doing some heap allocation profiling and found that during
 basic usage NSS accounts for 1/3 of all of Firefox's cumulative (*not*
 live) heap allocations. We're talking gigabytes of allocations in
 short browsing sessions. That is *insane*.

 I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1095272 about
 this. I've written several patches that fix problems, one of which has
 r+ and is awaiting checkin; check the dependent bugs.

In your analysis, it would be better to use a call stack trace depth
larger than 5 that allows us to see what non-NSS function is calling
into NSS.

The checks done in mozilla::pkix's CheckPublicKeySize can easily be
optimized. But, first check how often the call stack contains
CheckPublicKey vs VerifySignedData; CheckPublicKey can be optimized
even more than VerifySignedData.

My original plans for VerifySignedData was for it to have a cache
added to it, if/when performance testing showed that there was a
performance problem. It is likely that such a cache is important, even
without the heap thrashing that you are concerned about.

Also, there is already a bug on file about caching and coalescing SSL
server cert verification results in SSLServerCertVerification. This is
trickier than the type of caching you can do in VerifySignedData but
it is potentially a bigger win. Also, I think recent changes to
Gecko's connection management (the parallelism to a new host
restricted to 1 bug being fixed) made it more important to do at
least the coalescing part.

Note that when bug 1036103 is fixed (which will be basically whenever
I get around to posting one more patch), it will be possible to avoid
any of the NSS CERT_* API during certificate verification, if people
are willing to do a little (probably quite a bit, actiually)
refactoring.

That that, except for the calls to
SECKEY_DecodeDERSubjectPublicKeyInfo and SECKEY_ExtractPublicKey in
CheckPublicKeySize, mozilla::pkix allocates no memory at all, ever
(once CheckNameConstraints is replaced, which is the thing that is one
patch away from happening).

Cheers,
Brian
-- 
dev-tech-crypto mailing list
dev-tech-crypto@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-crypto


Re: Reducing NSS's allocation rate

2014-11-10 Thread Nicholas Nethercote
On Mon, Nov 10, 2014 at 8:53 PM, Brian Smith br...@briansmith.org wrote:

 I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1095272 about
 this. I've written several patches that fix problems, one of which has
 r+ and is awaiting checkin; check the dependent bugs.

 In your analysis, it would be better to use a call stack trace depth
 larger than 5 that allows us to see what non-NSS function is calling
 into NSS.

I've attached to the bug a profile that uses a stack trace depth of 10.

Nick
-- 
dev-tech-crypto mailing list
dev-tech-crypto@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-crypto