Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-13 Thread Boris Zbarsky

On 7/13/18 5:22 PM, gsquel...@mozilla.com wrote:

E.g., could I instrument one class, so that every allocation would be tracked 
automatically, and I'd get nice stats at the end?


You mean apart from just having a memory reporter for it?


Including wasted space because of larger allocation blocks?


Memory reporters using mallocSizeOf include that space, yes.


Could I even run what-if scenarios, where I could instrument a class and 
extract its current size but also provide an alternate size (based on what I 
think I could make it shrink), and in the end I'll know how much I could save 
overall?


You could hack the relevant memory reporter, sure.


Do we have Try tests that simulate real-world usage, so we could collect 
memory-usage data that's relevant to our users, but also reproducible?


See the "awsy-10s" test suite, which sort of aims to do that.


Should there be some kind of Talos-like CI tests that focus on memory usage, so 
we'd get some warning if a particular patch suddenly eats too much memory?


This is what awsy-e10s aims to do, yes.

-Boris
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-13 Thread gsquelart
On Wednesday, July 11, 2018 at 4:19:15 AM UTC+10, Kris Maglione wrote:
> [...]
> Essentially what this means, though, is that if we identify an area of 
> overhead that's 50KB[3] or larger that can be eliminated, it *has* to be 
> eliminated. There just aren't that many large chunks to remove. They all need 
> to go. And if an area of code has a dozen 5KB chunks that can be eliminated, 
> maybe they don't all have to go, but at least half of them do. The more the 
> better.

Some questions: -- Sorry if some of this is already common knowledge or has 
been discussed.

Are there tools available, that could easily track memory usage of specific 
things?
E.g., could I instrument one class, so that every allocation would be tracked 
automatically, and I'd get nice stats at the end?
Including wasted space because of larger allocation blocks?

Could I even run what-if scenarios, where I could instrument a class and 
extract its current size but also provide an alternate size (based on what I 
think I could make it shrink), and in the end I'll know how much I could save 
overall?

Do we have Try tests that simulate real-world usage, so we could collect 
memory-usage data that's relevant to our users, but also reproducible?

Should there be some kind of Talos-like CI tests that focus on memory usage, so 
we'd get some warning if a particular patch suddenly eats too much memory?
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-13 Thread Kris Maglione

On Fri, Jul 13, 2018 at 11:14:24AM -0400, Randell Jesup wrote:

Hash tables are a big issue.  There are a lot of 64K/128K/256K
allocations at the moment for hashtables.  When we started looking at
this in bug 1436250, we had a 256K, ~4 128K, and a whole bunch of 64K
hashtable allocs (on linux).  Some may be smaller or gone now, but it's
still big.

I wonder if it's worth the perf hit to realloc to exact size hash tables
that are build-once - probably.  hashtable->Finalize()?  (I wonder if
that would let us make any other memory/speed optimizations if we know
the table is now static.)


I think, as much as possible, we really want static or mostly-static 
hash tables to be shared between processes. I've already been working on this 
in a few areas, e.g., bug 1470365 for string bundles, which are completely 
static, and bug 1471025 for preferences, which are mostly static.


And those patches add helpers which should make it pretty easy to do the same 
for more things in the future, so that should probably be our go-to strategy 
for reducing per-process overhead, when possible.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-13 Thread Felipe G
>
>
>
> >Also note that dealing with the "importance" of a page is not just a
> >matter of visibility and focus. There are other factors to take into
> >account such as if the page is playing audio or video (like listening to
> >music on YouTube), if it's self-updating and so on.
>
> Absolutely
>

We should think about how we can make different performance and memory
trade-offs for processes that are hosting top-level frames and processes
hosting 3rd-party subframes

>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-13 Thread Randell Jesup
>On 13/07/2018 04:55, Randell Jesup wrote:
>> Correct - we need to have observers/what-have-you for
>> background/foreground state (and we may want an intermediate state or
>> two - foreground-but-not-focused (for example a visible window that
>> isn't the focused window); recently-in-foreground (switching back and
>> forth); background-for-longer-than-delta, etc.
>> 
>> Modules can use these to drop caches, shut down unnecessary threads,
>> change strategies, force GCs/CCs, etc.

>Also note that dealing with the "importance" of a page is not just a
>matter of visibility and focus. There are other factors to take into
>account such as if the page is playing audio or video (like listening to
>music on YouTube), if it's self-updating and so on.

Absolutely

>The only mechanism to reduce memory consumption we have now is
>memory-pressure events which while functional are still under-used. We
>might also need more fine grained mechanisms than "drop as much memory
>as you can".

This is also very important for GeckoView

-- 
Randell Jesup, Mozilla Corp
remove "news" for personal email
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-13 Thread Randell Jesup
>On Thu, Jul 12, 2018 at 08:56:28AM -0700, Andrew McCreight wrote:
>>On Thu, Jul 12, 2018 at 3:57 AM, Emilio Cobos Álvarez 
>>wrote:
>>
>>> Just curious, is there a bug on file to measure excess capacity on
>>> nsTArrays and hash tables?
[snip]
>I kind of suspect that improving the storage efficiency of hashtables (and
>probably nsTArrays too) will have an out-sized effect on per-process
>memory. Just at startup, for a mostly empty process, we have a huge amount
>of memory devoted to hashtables that would otherwise be shared across a
>bunch of origins—enough that removing just 4 bytes of padding per entry
>would save 87K per process. And that number tends to grow as we populate
>caches that we need for things like layout and atoms.

Hash tables are a big issue.  There are a lot of 64K/128K/256K
allocations at the moment for hashtables.  When we started looking at
this in bug 1436250, we had a 256K, ~4 128K, and a whole bunch of 64K
hashtable allocs (on linux).  Some may be smaller or gone now, but it's
still big.

I wonder if it's worth the perf hit to realloc to exact size hash tables
that are build-once - probably.  hashtable->Finalize()?  (I wonder if
that would let us make any other memory/speed optimizations if we know
the table is now static.)

-- 
Randell Jesup, Mozilla Corp
remove "news" for personal email
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-13 Thread David Major
This touches on a really important point: we're not the only ones
allocating memory.

Just a few that come to mind: GPU drivers, system media codecs, a11y
tools, and especially on Windows we have to deal with "utility"
applications, corporate-mandated gunk, and downright crapware.

When we're measuring progress toward our goals, look at not only your
own pristine dev box but also that one neighbor whose adware you're
always cleaning out.


On Fri, Jul 13, 2018 at 7:57 AM Gabriele Svelto  wrote:
>
> Just another bit of info to raise awareness on a thorny issue we have to
> face if we want to significantly raise the number of content processes.
> On 64-bit Windows we often consume significantly more commit space than
> physical memory. This consumption is currently unaccounted for in
> about:memory though I've seen hints of it being cause by the GPU driver
> (or other parts of the graphics pipeline). I've filed bug 1475518 [1] so
> that I don't forget and I encourage anybody with Windows experience to
> have a look because it's something we _need_ to solve to reduce content
> process memory usage.
>
>  Gabriele
>
> [1] Commit-space usage investigation
> https://bugzilla.mozilla.org/show_bug.cgi?id=1475518
>
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-13 Thread Gabriele Svelto
Just another bit of info to raise awareness on a thorny issue we have to
face if we want to significantly raise the number of content processes.
On 64-bit Windows we often consume significantly more commit space than
physical memory. This consumption is currently unaccounted for in
about:memory though I've seen hints of it being cause by the GPU driver
(or other parts of the graphics pipeline). I've filed bug 1475518 [1] so
that I don't forget and I encourage anybody with Windows experience to
have a look because it's something we _need_ to solve to reduce content
process memory usage.

 Gabriele

[1] Commit-space usage investigation
https://bugzilla.mozilla.org/show_bug.cgi?id=1475518



signature.asc
Description: OpenPGP digital signature
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-13 Thread Gabriele Svelto
On 13/07/2018 04:55, Randell Jesup wrote:
> Correct - we need to have observers/what-have-you for
> background/foreground state (and we may want an intermediate state or
> two - foreground-but-not-focused (for example a visible window that
> isn't the focused window); recently-in-foreground (switching back and
> forth); background-for-longer-than-delta, etc.
> 
> Modules can use these to drop caches, shut down unnecessary threads,
> change strategies, force GCs/CCs, etc.
> 
> Some of this certainly already exists, but may need to be extended (and
> used a lot more).

We already had most of this stuff in the ProcessPriorityManager [1]
which has be only ever used in Firefox OS. Since we had
one-process-per-tab there it was designed that way so it might need some
reworking to deal with one tab consisting of multiple content processes.

Also note that dealing with the "importance" of a page is not just a
matter of visibility and focus. There are other factors to take into
account such as if the page is playing audio or video (like listening to
music on YouTube), if it's self-updating and so on.

The only mechanism to reduce memory consumption we have now is
memory-pressure events which while functional are still under-used. We
might also need more fine grained mechanisms than "drop as much memory
as you can".

 Gabriele

[1]
https://searchfox.org/mozilla-central/rev/46292b1212d2d61d7b5a7df184406774727085b8/dom/ipc/ProcessPriorityManager.cpp



signature.asc
Description: OpenPGP digital signature
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-12 Thread Randell Jesup
>On 07/12/2018 11:08 PM, Randell Jesup wrote:
>> We may need to trade first-load time against memory use by lazy-initing
>> more things than now, though we did quite a bit on that already for
>> reducing startup time.
>
>One thing to remember that some of the child processes will be more
>important than others. For example all the processes used for browsing
>contexts in the foreground tab should probably prefer performance over
>memory (in cases that is something we can choose from), but if a
>process is only used for browsing contexts in background tabs and isn't
>playing any audio or such, it can probably use less memory hungry
>approaches.

Correct - we need to have observers/what-have-you for
background/foreground state (and we may want an intermediate state or
two - foreground-but-not-focused (for example a visible window that
isn't the focused window); recently-in-foreground (switching back and
forth); background-for-longer-than-delta, etc.

Modules can use these to drop caches, shut down unnecessary threads,
change strategies, force GCs/CCs, etc.

Some of this certainly already exists, but may need to be extended (and
used a lot more).

-- 
Randell Jesup, Mozilla Corp
remove "news" for personal email
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-12 Thread Nicholas Nethercote
On Fri, Jul 13, 2018 at 1:56 AM, Andrew McCreight 
wrote:

> >
> > Just curious, is there a bug on file to measure excess capacity on
> > nsTArrays and hash tables?
>
> njn looked at that kind of issue at some point (he changed how arrays grow,
> for instance, to reduce overhead), but it has probably been around 5 years,
> so there may be room for improvement for things added in the meanwhile.
>

For a trip down memory lane, check out
https://blog.mozilla.org/nnethercote/2011/08/05/clownshoes-available-in-sizes-2101-and-up/.
The size classes described in that post are still in use today.

More usefully: if anyone wants to investigate slop -- which is only one
kind of wasted space, but an important one -- it's now really easy with DMD:
- Invoke DMD in "Live" mode (i.e. generic heap profiling mode, rather than
dark matter detection mode).
- Use the `--sort-by slop` flag with dmd.py.

Full instructions are at
https://developer.mozilla.org/en-US/docs/Mozilla/Performance/DMD.

Nick
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-12 Thread Cameron McCormack
On Fri, Jul 13, 2018, at 6:51 AM, Kris Maglione wrote:
> I actually have a patch sitting around with helpers to make it super easy to 
> use smart pointers as tagged pointers :) I never wound up putting it up for 
> review, since my original use case went away, but it you can think of any 
> specific cases where it would be useful, I'd be happy to try and get it 
> landed.

Speaking of tagged pointers, I've used lower one or two bits for tagging a 
number of times, but I've never tried packing things into the high bits of a 64 
bit pointer.  Is that inadvisable for any reason?  How many bits can I use, 
given the 64 bit platforms we need to support?
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-12 Thread Xidorn Quan
On Fri, Jul 13, 2018, at 7:08 AM, smaug wrote:
> One thing to remember that some of the child processes will be more 
> important than others. For example all the processes used for browsing 
> contexts in 
> the foreground tab should probably prefer performance over memory (in 
> cases that is something we can choose from), but if a process
> is only used for browsing contexts in background tabs and isn't playing 
> any audio or such, it can probably use less memory hungry approaches.
> Like, could stylo use fewer threads when used in background-tabs-only-
> processes, and once the process becomes foreground, more threads are 
> created.

I've filed a bug for this after I saw this email thread: 
https://bugzilla.mozilla.org/show_bug.cgi?id=1475091

- Xidorn
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-12 Thread smaug

On 07/12/2018 11:08 PM, Randell Jesup wrote:

I do hope that the 100 process figures scenario that was given is a worse case 
scenario though...


It's not.  Worst case is a LOT worse.

Shutting down threads/threadpools when not needed or off an idle timer
is a Good thing.  There may be some perf hit since it may mean starting
a thread instead of just sending a message at times; this may require
some tuning in specific cases, or leaving 1 thread or more running
anyways.

Stylo will be an interesting case here.

We may need to trade first-load time against memory use by lazy-initing
more things than now, though we did quite a bit on that already for
reducing startup time.




One thing to remember that some of the child processes will be more important than others. For example all the processes used for browsing contexts in 
the foreground tab should probably prefer performance over memory (in cases that is something we can choose from), but if a process

is only used for browsing contexts in background tabs and isn't playing any 
audio or such, it can probably use less memory hungry approaches.
Like, could stylo use fewer threads when used in 
background-tabs-only-processes, and once the process becomes foreground, more 
threads are created.
We have similar approach in many cases for performance and responsiveness 
reasons, but less often for memory usage reasons.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-12 Thread Kris Maglione

On Thu, Jul 12, 2018 at 10:27:13PM +0200, Gabriele Svelto wrote:

On 12/07/2018 22:19, Kris Maglione wrote:

I've actually been thinking on filing a bug to do something similar, to
measure cumulative effects of excess padding in certain types since I
began looking into bug 1460674, and Sylvestre mentioned that
clang-analyzer can generate reports on excess padding.


I've encountered at least one structure where a boolean flag is 64-bits
in size on 64-bit builds. If we really want to go to the last mile we
might want to also evaluate things like tagged pointers; there's
probably some KiB's to be saved there too.


I actually have a patch sitting around with helpers to make it super easy to 
use smart pointers as tagged pointers :) I never wound up putting it up for 
review, since my original use case went away, but it you can think of any 
specific cases where it would be useful, I'd be happy to try and get it 
landed.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-12 Thread Kris Maglione

On Thu, Jul 12, 2018 at 04:08:49PM -0400, Randell Jesup wrote:

I do hope that the 100 process figures scenario that was given is a worse case 
scenario though...


It's not.  Worst case is a LOT worse.

Shutting down threads/threadpools when not needed or off an idle timer
is a Good thing.  There may be some perf hit since it may mean starting
a thread instead of just sending a message at times; this may require
some tuning in specific cases, or leaving 1 thread or more running
anyways.

Stylo will be an interesting case here.

We may need to trade first-load time against memory use by lazy-initing
more things than now, though we did quite a bit on that already for
reducing startup time.


This is a really important point: Memory usage and performance deeply 
intertwined.


There are hard limits on the amount of memory we can use, and the more 
of it we waste needlessly, the less we have available for performance 
optimizations that need it. In the worst (performance) case, we wind up 
swapping, at which point performance may as well not exist.


We're going to have to make hard decisions about when/how often/how 
aggressively we flush caches, spin down threads, unload tabs, ... The 
more unnecessary overhead we save, the less extreme we're going to have 
to be about this. And the better we get at spinning down unused threads 
and evicting low impact cache entries, the less aggressive we're going 
to have to be about the high impact ones. Throwing those things away 
will have a performance impact, but not throwing them away will, in the 
end, have a bigger one.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-12 Thread Kris Maglione

On Thu, Jul 12, 2018 at 08:56:28AM -0700, Andrew McCreight wrote:

On Thu, Jul 12, 2018 at 3:57 AM, Emilio Cobos Álvarez 
wrote:


Thanks for doing this!

Just curious, is there a bug on file to measure excess capacity on
nsTArrays and hash tables?


njn looked at that kind of issue at some point (he changed how arrays grow,
for instance, to reduce overhead), but it has probably been around 5 years,
so there may be room for improvement for things added in the meanwhile.
However, our focus here is really on reducing per-process memory overhead,
rather than generic memory improvements, because we've had a lot of focus
on the latter as part of MemShrink, but not the former, so there's likely
easier improvements to be had.


I kind of suspect that improving the storage efficiency of hashtables (and 
probably nsTArrays too) will have an out-sized effect on per-process memory. 
Just at startup, for a mostly empty process, we have a huge amount of memory 
devoted to hashtables that would otherwise be shared across a bunch of 
origins—enough that removing just 4 bytes of padding per entry would save 87K 
per process. And that number tends to grow as we populate caches that we need 
for things like layout and atoms.


As much as I'd like to be able to share many of those caches between 
processes, there are always going to need process-specific hashtables on top 
of the shared ones for things that can't be/shouldn't be/aren't yet shared. 
And that extra overhead tends to grow proportionally to the number of 
processes we have.



On 07/10/2018 08:19 PM, Kris Maglione wrote:


Welcome to the first edition of the Fission MemShrink newsletter.[1]

In this edition, I'll sum up what the project is, and why it matters to
you. In subsequent editions, I'll give updates on progress that we've made,
and areas that we'll need to focus on next.[2]


The Fission MemShrink project is one of the most easily overlooked
aspects of Project Fission (also known as Site Isolation), but is
absolutely critical to its success. And will require a company- and
community-wide effort effort to meet its goals.

The problem is thus: In order for site isolation to work, we need to be
able to run *at least* 100 content processes in an average Firefox session.
Each of those processes has its own base memory overhead—memory we use just
for creating the process, regardless of what's running in it. In the
post-Fission world, that overhead needs to be less than 10MB per process in
order to keep the extra overhead from Fission below 1GB. Right now, on our
best-cast platform, Windows 10, is somewhere between 17 and 21MB. Linux and
OS-X hover between 25 and 35MB. In other words, between 2 and 3.5GB for an
ordinary session.

That means that, in the best case, we need to reduce the memory we use in
content processes by *at least* 7MB. The problem, of course, is that there
are only so many places we can cut memory without losing functionality, and
even fewer places where we can make big wins. But, there are lots of places
we can make small and medium-sized wins.

So, to put the task into perspective, of all of the places we can cut a
certain amount of overhead, here are the number of each that we need to fix
in order to reach 1MB:

250KB:   4
100KB:  10
75KB:   13
50KB:   20
20KB:   50
10KB:  100
5KB:   200

Now remember: we need to do *all* of these in order to reach our goal.
It's not a matter of one 250KB improvement or 50 5KB improvements. It's 4
250KB *and* 200 5KB improvements. There just aren't enough places we can
cut 250KB. If we fall short in any of those areas, Project Fission will
fail, and Firefox will be the only major browser without site isolation.

But it won't fail, because all of you are awesome, and this is a totally
achievable goal if we all throw our effort behind it.

Essentially what this means, though, is that if we identify an area of
overhead that's 50KB[3] or larger that can be eliminated, it *has* to be
eliminated. There just aren't that many large chunks to remove. They all
need to go. And if an area of code has a dozen 5KB chunks that can be
eliminated, maybe they don't all have to go, but at least half of them do.
The more the better.


To help us triage these issues, we have a tracking bug (
https://bugzil.la/memshrink-content), and a per-bug whiteboard tag
([overhead:...]) which gives an estimate of how much per-process overhead
we believe fixing that bug would eliminate. Please feel free to add
blockers to the tracking bug if you think they're relevant, and to add or
update [overhead] tags if you have reasonable estimates.


With all of that said, here's a brief update of the progress we've made
so far:

In the past month, unique memory per process[4] has dropped 3-4MB[5], and
JS memory usage in particular has dropped 1.1-1.9MB.

Particular credit goes to:

* Eric Rahm added an AWSY test suite to track base content process memory
   (https://bugzil.la/1442361). Results:

Resident unique: 

Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-12 Thread Gabriele Svelto
On 12/07/2018 22:19, Kris Maglione wrote:
> I've actually been thinking on filing a bug to do something similar, to
> measure cumulative effects of excess padding in certain types since I
> began looking into bug 1460674, and Sylvestre mentioned that
> clang-analyzer can generate reports on excess padding.

I've encountered at least one structure where a boolean flag is 64-bits
in size on 64-bit builds. If we really want to go to the last mile we
might want to also evaluate things like tagged pointers; there's
probably some KiB's to be saved there too.

There's also more than one place where we're using strings to identify
stuff where we could use enums/integers instead. And yeah, my much
delayed refactoring of the observer service got a lot higher on my
priority list after reading this thread.

 Gabriele



signature.asc
Description: OpenPGP digital signature
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-12 Thread Kris Maglione

On Thu, Jul 12, 2018 at 12:57:35PM +0200, Emilio Cobos Álvarez wrote:

Thanks for doing this!

Just curious, is there a bug on file to measure excess capacity on 
nsTArrays and hash tables?


I don't think so, but it's a good idea.

I've actually been thinking on filing a bug to do something similar, to 
measure cumulative effects of excess padding in certain types since I 
began looking into bug 1460674, and Sylvestre mentioned that 
clang-analyzer can generate reports on excess padding.


It would probably be a good idea to try to roll this into the same 
project.


One nice change coming up on this front is that bug 1402910 will probably 
allow us to increase the load factors of most of our hashtables without 
losing performance. Having up-to-date numbers for these things would 
probably help decide how to prioritize those sorts of bugs.



On 07/10/2018 08:19 PM, Kris Maglione wrote:

Welcome to the first edition of the Fission MemShrink newsletter.[1]

In this edition, I'll sum up what the project is, and why it matters 
to you. In subsequent editions, I'll give updates on progress that 
we've made, and areas that we'll need to focus on next.[2]



The Fission MemShrink project is one of the most easily overlooked 
aspects of Project Fission (also known as Site Isolation), but is 
absolutely critical to its success. And will require a company- and 
community-wide effort effort to meet its goals.


The problem is thus: In order for site isolation to work, we need to 
be able to run *at least* 100 content processes in an average 
Firefox session. Each of those processes has its own base memory 
overhead—memory we use just for creating the process, regardless of 
what's running in it. In the post-Fission world, that overhead needs 
to be less than 10MB per process in order to keep the extra overhead 
from Fission below 1GB. Right now, on our best-cast platform, 
Windows 10, is somewhere between 17 and 21MB. Linux and OS-X hover 
between 25 and 35MB. In other words, between 2 and 3.5GB for an 
ordinary session.


That means that, in the best case, we need to reduce the memory we 
use in content processes by *at least* 7MB. The problem, of course, 
is that there are only so many places we can cut memory without 
losing functionality, and even fewer places where we can make big 
wins. But, there are lots of places we can make small and 
medium-sized wins.


So, to put the task into perspective, of all of the places we can 
cut a certain amount of overhead, here are the number of each that 
we need to fix in order to reach 1MB:


250KB:   4
100KB:  10
75KB:   13
50KB:   20
20KB:   50
10KB:  100
5KB:   200

Now remember: we need to do *all* of these in order to reach our 
goal. It's not a matter of one 250KB improvement or 50 5KB 
improvements. It's 4 250KB *and* 200 5KB improvements. There just 
aren't enough places we can cut 250KB. If we fall short in any of 
those areas, Project Fission will fail, and Firefox will be the only 
major browser without site isolation.


But it won't fail, because all of you are awesome, and this is a 
totally achievable goal if we all throw our effort behind it.


Essentially what this means, though, is that if we identify an area 
of overhead that's 50KB[3] or larger that can be eliminated, it 
*has* to be eliminated. There just aren't that many large chunks to 
remove. They all need to go. And if an area of code has a dozen 5KB 
chunks that can be eliminated, maybe they don't all have to go, but 
at least half of them do. The more the better.



To help us triage these issues, we have a tracking bug 
(https://bugzil.la/memshrink-content), and a per-bug whiteboard tag 
([overhead:...]) which gives an estimate of how much per-process 
overhead we believe fixing that bug would eliminate. Please feel 
free to add blockers to the tracking bug if you think they're 
relevant, and to add or update [overhead] tags if you have 
reasonable estimates.



With all of that said, here's a brief update of the progress we've 
made so far:


In the past month, unique memory per process[4] has dropped 
3-4MB[5], and JS memory usage in particular has dropped 1.1-1.9MB.


Particular credit goes to:

* Eric Rahm added an AWSY test suite to track base content process memory
  (https://bugzil.la/1442361). Results:

   Resident unique: 
https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-central,1684862,1,4=mozilla-central,1684846,1,4=mozilla-central,1685133,1,4=mozilla-central,1685127,1,4

   Explicit allocations: 
https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-inbound,1706218,1,4=mozilla-inbound,1706220,1,4=mozilla-inbound,1706216,1,4

   JS: 
https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-central,1684866,1,4=mozilla-central,1685137,1,4=mozilla-central,1685131,1,4


* Andrew McCreight created a tool for tracking JS memory usage, and 
figuring

  out which scripts and objects are responsible for how much of it
  (https://bugzil.la/1463569).

* Andrew 

Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-12 Thread Randell Jesup
>I do hope that the 100 process figures scenario that was given is a worse case 
>scenario though...

It's not.  Worst case is a LOT worse.

Shutting down threads/threadpools when not needed or off an idle timer
is a Good thing.  There may be some perf hit since it may mean starting
a thread instead of just sending a message at times; this may require
some tuning in specific cases, or leaving 1 thread or more running
anyways.

Stylo will be an interesting case here.

We may need to trade first-load time against memory use by lazy-initing
more things than now, though we did quite a bit on that already for
reducing startup time.

-- 
Randell Jesup, Mozilla Corp
remove "news" for personal email
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-12 Thread Andrew McCreight
On Thu, Jul 12, 2018 at 3:57 AM, Emilio Cobos Álvarez 
wrote:

> Thanks for doing this!
>
> Just curious, is there a bug on file to measure excess capacity on
> nsTArrays and hash tables?
>
> WebKit has a bunch of bugs like:
>
>   https://bugs.webkit.org/show_bug.cgi?id=186709
>
> Which seem relevant.
>

njn looked at that kind of issue at some point (he changed how arrays grow,
for instance, to reduce overhead), but it has probably been around 5 years,
so there may be room for improvement for things added in the meanwhile.
However, our focus here is really on reducing per-process memory overhead,
rather than generic memory improvements, because we've had a lot of focus
on the latter as part of MemShrink, but not the former, so there's likely
easier improvements to be had.

Andrew


>  -- Emilio
>
> On 07/10/2018 08:19 PM, Kris Maglione wrote:
>
>> Welcome to the first edition of the Fission MemShrink newsletter.[1]
>>
>> In this edition, I'll sum up what the project is, and why it matters to
>> you. In subsequent editions, I'll give updates on progress that we've made,
>> and areas that we'll need to focus on next.[2]
>>
>>
>> The Fission MemShrink project is one of the most easily overlooked
>> aspects of Project Fission (also known as Site Isolation), but is
>> absolutely critical to its success. And will require a company- and
>> community-wide effort effort to meet its goals.
>>
>> The problem is thus: In order for site isolation to work, we need to be
>> able to run *at least* 100 content processes in an average Firefox session.
>> Each of those processes has its own base memory overhead—memory we use just
>> for creating the process, regardless of what's running in it. In the
>> post-Fission world, that overhead needs to be less than 10MB per process in
>> order to keep the extra overhead from Fission below 1GB. Right now, on our
>> best-cast platform, Windows 10, is somewhere between 17 and 21MB. Linux and
>> OS-X hover between 25 and 35MB. In other words, between 2 and 3.5GB for an
>> ordinary session.
>>
>> That means that, in the best case, we need to reduce the memory we use in
>> content processes by *at least* 7MB. The problem, of course, is that there
>> are only so many places we can cut memory without losing functionality, and
>> even fewer places where we can make big wins. But, there are lots of places
>> we can make small and medium-sized wins.
>>
>> So, to put the task into perspective, of all of the places we can cut a
>> certain amount of overhead, here are the number of each that we need to fix
>> in order to reach 1MB:
>>
>> 250KB:   4
>> 100KB:  10
>> 75KB:   13
>> 50KB:   20
>> 20KB:   50
>> 10KB:  100
>> 5KB:   200
>>
>> Now remember: we need to do *all* of these in order to reach our goal.
>> It's not a matter of one 250KB improvement or 50 5KB improvements. It's 4
>> 250KB *and* 200 5KB improvements. There just aren't enough places we can
>> cut 250KB. If we fall short in any of those areas, Project Fission will
>> fail, and Firefox will be the only major browser without site isolation.
>>
>> But it won't fail, because all of you are awesome, and this is a totally
>> achievable goal if we all throw our effort behind it.
>>
>> Essentially what this means, though, is that if we identify an area of
>> overhead that's 50KB[3] or larger that can be eliminated, it *has* to be
>> eliminated. There just aren't that many large chunks to remove. They all
>> need to go. And if an area of code has a dozen 5KB chunks that can be
>> eliminated, maybe they don't all have to go, but at least half of them do.
>> The more the better.
>>
>>
>> To help us triage these issues, we have a tracking bug (
>> https://bugzil.la/memshrink-content), and a per-bug whiteboard tag
>> ([overhead:...]) which gives an estimate of how much per-process overhead
>> we believe fixing that bug would eliminate. Please feel free to add
>> blockers to the tracking bug if you think they're relevant, and to add or
>> update [overhead] tags if you have reasonable estimates.
>>
>>
>> With all of that said, here's a brief update of the progress we've made
>> so far:
>>
>> In the past month, unique memory per process[4] has dropped 3-4MB[5], and
>> JS memory usage in particular has dropped 1.1-1.9MB.
>>
>> Particular credit goes to:
>>
>> * Eric Rahm added an AWSY test suite to track base content process memory
>>(https://bugzil.la/1442361). Results:
>>
>> Resident unique: https://treeherder.mozilla.org
>> /perf.html#/graphs?series=mozilla-central,1684862,1,4
>> =mozilla-central,1684846,1,4=mozilla-central,
>> 1685133,1,4=mozilla-central,1685127,1,4
>> Explicit allocations: https://treeherder.mozilla.org
>> /perf.html#/graphs?series=mozilla-inbound,1706218,1,4
>> =mozilla-inbound,1706220,1,4=mozilla-inbound,1706216,1,4
>> JS: https://treeherder.mozilla.org/perf.html#/graphs?series=mozi
>> lla-central,1684866,1,4=mozilla-central,1685137,1,4&
>> series=mozilla-central,1685131,1,4
>>
>> * Andrew McCreight 

Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-12 Thread Tom Ritter
On Wed, Jul 11, 2018 at 6:25 PM, Karl Tomlinson  wrote:

> Is there a guideline that should be used to evaluate what can
> acceptably run in the same process for different sites?
>


This is on me to write. I have been slow at doing so mainly because there's
a lot of "What does X look like and where do its pats run" investigation I
feel I need to do to write it. (For X in at least { WebExtensions, WebRTC,
Compositing, Filters, ... })



> I assume the primary goal is to prevent one site from reading
> information that should only be available to another site?
>

Yep.



On Wed, Jul 11, 2018 at 6:56 PM, Robert O'Callahan 
wrote:

> On Thu, Jul 12, 2018 at 11:25 AM, Karl Tomlinson 
> wrote:
>
> > Would it be easier to answer the opposite question?  What should
> > not run in a shared process?  JS is a given.  Others?
> >
>
> Currently when an exploitable bug is found in content process code,
> attackers use JS to weaponize it with an arsenal of known techniques (e.g.
> heap spraying and shaping). An important question is whether, assuming a
> similar bug were found in a shared non-content process, how difficult would
> it be for content JS to apply those techniques remotely across the process
> boundary?


You're completely correct.


> That would be a pretty interesting problem for security
> researchers to work on.
>

It's always illustrative to have exploits that demonstrate this goal in the
target of interest - they may have created generic techniques that we can
address fundamentally (like with Memory Partitioning or Allocator
Hardening).  But people have been writing exploits for targets that don't
have a scripting environment for two decades or more, so all of those are
prior art for this sort of exploitation.  This isn't a reason not to pursue
this work, and it's not saying this work isn't a net security win though!

I have been pondering (and brainstormed with a few people) about creating
something Google native-client-like to enforce process-like state
separation between threads in a single process. That might make it safer to
share utility processes between content processes. But it's considerably
less straightforward than I was hoping. Big open research question.


Use of system font, graphics, or audio servers is in a similar bucket I
> > guess.
> >
>
> Taking control of an audio server would let you listen into phone calls,
> which seems interesting.
>
> Another question is whether you can exfiltrate cross-origin data by
> performing side-channel attacks against those shared processes. You
> probably need to assume that Spectre-ish attacks will be blocked at process
> boundaries by hardware/OS mitigations, but there could be
> browser-implementation-specific timing attacks etc. E.g. do IPDL IDs
> exposed to content processes leak useful information about the activities
> of other processes? Of course there are cross-origin timing-based
> information leaks that are already known and somewhat unfixable :-(.


Yup!

-tom
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-12 Thread Emilio Cobos Álvarez

Thanks for doing this!

Just curious, is there a bug on file to measure excess capacity on 
nsTArrays and hash tables?


WebKit has a bunch of bugs like:

  https://bugs.webkit.org/show_bug.cgi?id=186709

Which seem relevant.

 -- Emilio

On 07/10/2018 08:19 PM, Kris Maglione wrote:

Welcome to the first edition of the Fission MemShrink newsletter.[1]

In this edition, I'll sum up what the project is, and why it matters to 
you. In subsequent editions, I'll give updates on progress that we've 
made, and areas that we'll need to focus on next.[2]



The Fission MemShrink project is one of the most easily overlooked 
aspects of Project Fission (also known as Site Isolation), but is 
absolutely critical to its success. And will require a company- and 
community-wide effort effort to meet its goals.


The problem is thus: In order for site isolation to work, we need to be 
able to run *at least* 100 content processes in an average Firefox 
session. Each of those processes has its own base memory overhead—memory 
we use just for creating the process, regardless of what's running in 
it. In the post-Fission world, that overhead needs to be less than 10MB 
per process in order to keep the extra overhead from Fission below 1GB. 
Right now, on our best-cast platform, Windows 10, is somewhere between 
17 and 21MB. Linux and OS-X hover between 25 and 35MB. In other words, 
between 2 and 3.5GB for an ordinary session.


That means that, in the best case, we need to reduce the memory we use 
in content processes by *at least* 7MB. The problem, of course, is that 
there are only so many places we can cut memory without losing 
functionality, and even fewer places where we can make big wins. But, 
there are lots of places we can make small and medium-sized wins.


So, to put the task into perspective, of all of the places we can cut a 
certain amount of overhead, here are the number of each that we need to 
fix in order to reach 1MB:


250KB:   4
100KB:  10
75KB:   13
50KB:   20
20KB:   50
10KB:  100
5KB:   200

Now remember: we need to do *all* of these in order to reach our goal. 
It's not a matter of one 250KB improvement or 50 5KB improvements. It's 
4 250KB *and* 200 5KB improvements. There just aren't enough places we 
can cut 250KB. If we fall short in any of those areas, Project Fission 
will fail, and Firefox will be the only major browser without site 
isolation.


But it won't fail, because all of you are awesome, and this is a totally 
achievable goal if we all throw our effort behind it.


Essentially what this means, though, is that if we identify an area of 
overhead that's 50KB[3] or larger that can be eliminated, it *has* to be 
eliminated. There just aren't that many large chunks to remove. They all 
need to go. And if an area of code has a dozen 5KB chunks that can be 
eliminated, maybe they don't all have to go, but at least half of them 
do. The more the better.



To help us triage these issues, we have a tracking bug 
(https://bugzil.la/memshrink-content), and a per-bug whiteboard tag 
([overhead:...]) which gives an estimate of how much per-process 
overhead we believe fixing that bug would eliminate. Please feel free to 
add blockers to the tracking bug if you think they're relevant, and to 
add or update [overhead] tags if you have reasonable estimates.



With all of that said, here's a brief update of the progress we've made 
so far:


In the past month, unique memory per process[4] has dropped 3-4MB[5], 
and JS memory usage in particular has dropped 1.1-1.9MB.


Particular credit goes to:

* Eric Rahm added an AWSY test suite to track base content process memory
   (https://bugzil.la/1442361). Results:

    Resident unique: 
https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-central,1684862,1,4=mozilla-central,1684846,1,4=mozilla-central,1685133,1,4=mozilla-central,1685127,1,4 

    Explicit allocations: 
https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-inbound,1706218,1,4=mozilla-inbound,1706220,1,4=mozilla-inbound,1706216,1,4 

    JS: 
https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-central,1684866,1,4=mozilla-central,1685137,1,4=mozilla-central,1685131,1,4 



* Andrew McCreight created a tool for tracking JS memory usage, and 
figuring

   out which scripts and objects are responsible for how much of it
   (https://bugzil.la/1463569).

* Andrew and Nika Layzell also completely rewrote the way we handle 
XPIDL type
   info so that it's statically compiled into the executable and shared 
between

   all processes (https://bugzil.la/1438688, https://bugzil.la/1444745).

* Felipe Gomes split a bunch of code out of frame scripts so that it 
could be
   lazily loaded only when needed (https://bugzil.la/1467278, ...) and 
added a
   whitelist of JSMs that are allowed to be loaded at content process 
startup

   (https://bugzil.la/1471066)

* I did a bit of this too, and also prevented us from loading some other 
JSMs
   before we need them 

Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-11 Thread Robert O'Callahan
On Thu, Jul 12, 2018 at 11:25 AM, Karl Tomlinson  wrote:

> Would it be easier to answer the opposite question?  What should
> not run in a shared process?  JS is a given.  Others?
>

Currently when an exploitable bug is found in content process code,
attackers use JS to weaponize it with an arsenal of known techniques (e.g.
heap spraying and shaping). An important question is whether, assuming a
similar bug were found in a shared non-content process, how difficult would
it be for content JS to apply those techniques remotely across the process
boundary? That would be a pretty interesting problem for security
researchers to work on.

Use of system font, graphics, or audio servers is in a similar bucket I
> guess.
>

Taking control of an audio server would let you listen into phone calls,
which seems interesting.

Another question is whether you can exfiltrate cross-origin data by
performing side-channel attacks against those shared processes. You
probably need to assume that Spectre-ish attacks will be blocked at process
boundaries by hardware/OS mitigations, but there could be
browser-implementation-specific timing attacks etc. E.g. do IPDL IDs
exposed to content processes leak useful information about the activities
of other processes? Of course there are cross-origin timing-based
information leaks that are already known and somewhat unfixable :-(.

Rob
-- 
Su ot deraeppa sah dna Rehtaf eht htiw saw hcihw, efil lanrete eht uoy ot
mialcorp ew dna, ti ot yfitset dna ti nees evah ew; deraeppa efil eht. Efil
fo Drow eht gninrecnoc mialcorp ew siht - dehcuot evah sdnah ruo dna ta
dekool evah ew hcihw, seye ruo htiw nees evah ew hcihw, draeh evah ew
hcihw, gninnigeb eht morf saw hcihw taht.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-11 Thread Karl Tomlinson
Is there a guideline that should be used to evaluate what can
acceptably run in the same process for different sites?

I assume the primary goal is to prevent one site from reading
information that should only be available to another site?

There would also be defense-in-depth value from having each site
sandboxed separately because a security breach from one site could
not compromise another.

I guess a single compositor process is acceptable because there is
essentially no information returning from the compositor?

A font server may be acceptable, because information returned is
of limited power?

Use of system font, graphics, or audio servers is in a similar
bucket I guess.

Would using a single process for network be acceptable, not
because information returned is limited, but because we're willing
to have some compromise because there is a small API surface?  Or
would that be acceptable because content JS does not run in that
process?

Would it be acceptable to perform layout in a single process for
multiple sites (if that were practical)?

Would it be easier to answer the opposite question?  What should
not run in a shared process?  JS is a given.  Others?
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-11 Thread Kris Maglione

On Wed, Jul 11, 2018 at 11:42:01PM +0200, Jean-Yves Avenard wrote:

On 11 Jul 2018, at 10:10 pm, Kris Maglione  wrote:
It looks like it will be helpful, but unfortunately won't give us the 2MB 
simple arithmetic would suggest. On Windows, at least, (and probably 
elsewhere, but need to confirm) thread stacks are lazily committed, so as long 
as the decoders aren't used in a process, the overhead is probably closer to 
25KB per thread.


I haven’t looked much in details, not being an expert on this and having just 
finished watching the world cup…


A quick glance at the code gives me:

On mac/linux using pthread:
when a thread is created, the stack size is set using pthread_attr_setstacksize
https://searchfox.org/mozilla-central/source/nsprpub/pr/src/pthreads/ptthread.c#355

On Linux, the man page is clear:
"The stack size attribute determines the minimum size (in bytes) that will be 
allocated for threads created using the thread attributes object attr.”


Right, but allocation size doesn't imply that the memory is committed, just that 
it's mapped. In general, anonymous mapped memory isn't actually committed (and 
therefore doesn't become part of the process's USS) until it's touched.



On Windows:
https://searchfox.org/mozilla-central/source/nsprpub/pr/src/md/windows/w95thred.c#151

the thread is created with STACK_SIZE_PARAM_IS_A_RESERVATION flag set. This 
will allocate the memory immediately.


Allocate, yes, but not commit. That flag is actually what ensures that our 
Windows thread stacks don't consume system memory until they're actually 
touched.


The saving I was mentioning earlier isn’t just due to media decoder threadpool 
thread stack no longer needing to be that big, but that all other threadpools 
can be reduced too. Threadpools aren’t used only when playing a video/audio 
file.


Reducing thread pool sizes would certainly be helpful. One unfortunate 
side-effect of large thread pools is that, even with lazy commit thread stacks, 
the more threads you run code on, the more stacks wind up with committed pages.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-11 Thread Mike Hommey
On Wed, Jul 11, 2018 at 11:42:01PM +0200, Jean-Yves Avenard wrote:
> Hi
> 
> > On 11 Jul 2018, at 10:10 pm, Kris Maglione  wrote:
> > Thanks. Boris added this as a blocker.
> > 
> > It looks like it will be helpful, but unfortunately won't give us the 2MB 
> > simple arithmetic would suggest. On Windows, at least, (and probably 
> > elsewhere, but need to confirm) thread stacks are lazily committed, so as 
> > long as the decoders aren't used in a process, the overhead is probably 
> > closer to 25KB per thread.
> > 
> > Shrinking the size of the thread pool and lazily spinning up threads when 
> > they're first needed would probably save us 200KB per process, though...
> 
> I haven’t looked much in details, not being an expert on this and having just 
> finished watching the world cup…
> 
> A quick glance at the code gives me:
> 
> On mac/linux using pthread:
> when a thread is created, the stack size is set using 
> pthread_attr_setstacksize
> https://searchfox.org/mozilla-central/source/nsprpub/pr/src/pthreads/ptthread.c#355
> 
> On Linux, the man page is clear:
> "The stack size attribute determines the minimum size (in bytes) that will be 
> allocated for threads created using the thread attributes object attr.”
> 
> On mac, less so, I’m not sure what’s the behaviour there is, if it’s 
> allocated or not…
> 
> On Windows:
> https://searchfox.org/mozilla-central/source/nsprpub/pr/src/md/windows/w95thred.c#151
> 
> the thread is created with STACK_SIZE_PARAM_IS_A_RESERVATION flag set. This 
> will allocate the memory immediately.

Allocate in this context means address space being consumed. It doesn't
mean memory being actually committed. Memory is only committed once
used, so only as much as what the code running in the thread actually
uses is committed (rounded to page size).

This means at least 4k per thread, so the more threads we have at
initialization, the more memory is committed. That being said, we're
talking about something akin to NUWA here, and presumably, we're talking
about processes that don't initialize everything.

Mike
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-11 Thread Jean-Yves Avenard
Hi

> On 11 Jul 2018, at 10:10 pm, Kris Maglione  wrote:
> Thanks. Boris added this as a blocker.
> 
> It looks like it will be helpful, but unfortunately won't give us the 2MB 
> simple arithmetic would suggest. On Windows, at least, (and probably 
> elsewhere, but need to confirm) thread stacks are lazily committed, so as 
> long as the decoders aren't used in a process, the overhead is probably 
> closer to 25KB per thread.
> 
> Shrinking the size of the thread pool and lazily spinning up threads when 
> they're first needed would probably save us 200KB per process, though...

I haven’t looked much in details, not being an expert on this and having just 
finished watching the world cup…

A quick glance at the code gives me:

On mac/linux using pthread:
when a thread is created, the stack size is set using pthread_attr_setstacksize
https://searchfox.org/mozilla-central/source/nsprpub/pr/src/pthreads/ptthread.c#355

On Linux, the man page is clear:
"The stack size attribute determines the minimum size (in bytes) that will be 
allocated for threads created using the thread attributes object attr.”

On mac, less so, I’m not sure what’s the behaviour there is, if it’s allocated 
or not…

On Windows:
https://searchfox.org/mozilla-central/source/nsprpub/pr/src/md/windows/w95thred.c#151

the thread is created with STACK_SIZE_PARAM_IS_A_RESERVATION flag set. This 
will allocate the memory immediately.

The saving I was mentioning earlier isn’t just due to media decoder threadpool 
thread stack no longer needing to be that big, but that all other threadpools 
can be reduced too. Threadpools aren’t used only when playing a video/audio 
file.

Anyway, this needs further inspection… we’ll know soon :)

I do hope that the 100 process figures scenario that was given is a worse case 
scenario though...
JY



smime.p7s
Description: S/MIME cryptographic signature
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-11 Thread Kris Maglione

On Wed, Jul 11, 2018 at 01:49:04PM +0200, Jean-Yves Avenard wrote:

There’s one place where we could gain heaps is in the media stack.
Currently, each content process allocate a thread-pool with at least 8 
threads for use with the media decoders, each threads a default stack size of 
256kB.

(https://searchfox.org/mozilla-central/source/xpcom/threads/nsIThreadManager.idl#53)

That stack size has been increased over the years due to the growing use of 
either system frameworks (in particular the mac CoreVideo framework that use 
over 200kB alone), and right now 256kB itself isn’t enough for the new AV1 
decoder from libaom.


One of the work the media team has started, is to have all those decoders run 
in a dedicated process: the reason for this work was mostly done for security 
reasons, but there will be side gains memory-wise.


This work is tracked in bug 1471535 
(https://bugzilla.mozilla.org/show_bug.cgi?id=1471535)


Once this is done, and we no longer calls decoders in the content process, 
the decoder process could use an increase stack size, while reducing the 
content process default stack size to 128kB (and maybe even 64kB)


That alone may be sufficient to achieve your mentioned goals.


Thanks. Boris added this as a blocker.

It looks like it will be helpful, but unfortunately won't give us the 2MB 
simple arithmetic would suggest. On Windows, at least, (and probably 
elsewhere, but need to confirm) thread stacks are lazily committed, so as long 
as the decoders aren't used in a process, the overhead is probably closer to 
25KB per thread.


Shrinking the size of the thread pool and lazily spinning up threads when 
they're first needed would probably save us 200KB per process, though...



An immediate intermediary step could be to use two different stack sizes as we 
pretty much know which one needs more over others.

JY



On 10 Jul 2018, at 8:19 pm, Kris Maglione  wrote:

Welcome to the first edition of the Fission MemShrink newsletter.[1]

In this edition, I'll sum up what the project is, and why it matters to you. In 
subsequent editions, I'll give updates on progress that we've made, and areas 
that we'll need to focus on next.[2]


The Fission MemShrink project is one of the most easily overlooked aspects of 
Project Fission (also known as Site Isolation), but is absolutely critical to 
its success. And will require a company- and community-wide effort effort to 
meet its goals.

The problem is thus: In order for site isolation to work, we need to be able to 
run *at least* 100 content processes in an average Firefox session. Each of 
those processes has its own base memory overhead—memory we use just for 
creating the process, regardless of what's running in it. In the post-Fission 
world, that overhead needs to be less than 10MB per process in order to keep 
the extra overhead from Fission below 1GB. Right now, on our best-cast 
platform, Windows 10, is somewhere between 17 and 21MB. Linux and OS-X hover 
between 25 and 35MB. In other words, between 2 and 3.5GB for an ordinary 
session.

That means that, in the best case, we need to reduce the memory we use in 
content processes by *at least* 7MB. The problem, of course, is that there are 
only so many places we can cut memory without losing functionality, and even 
fewer places where we can make big wins. But, there are lots of places we can 
make small and medium-sized wins.

So, to put the task into perspective, of all of the places we can cut a certain 
amount of overhead, here are the number of each that we need to fix in order to 
reach 1MB:

250KB:   4
100KB:  10
75KB:   13
50KB:   20
20KB:   50
10KB:  100
5KB:   200

Now remember: we need to do *all* of these in order to reach our goal. It's not 
a matter of one 250KB improvement or 50 5KB improvements. It's 4 250KB *and* 
200 5KB improvements. There just aren't enough places we can cut 250KB. If we 
fall short in any of those areas, Project Fission will fail, and Firefox will 
be the only major browser without site isolation.

But it won't fail, because all of you are awesome, and this is a totally 
achievable goal if we all throw our effort behind it.

Essentially what this means, though, is that if we identify an area of overhead 
that's 50KB[3] or larger that can be eliminated, it *has* to be eliminated. 
There just aren't that many large chunks to remove. They all need to go. And if 
an area of code has a dozen 5KB chunks that can be eliminated, maybe they don't 
all have to go, but at least half of them do. The more the better.


To help us triage these issues, we have a tracking bug 
(https://bugzil.la/memshrink-content), and a per-bug whiteboard tag 
([overhead:...]) which gives an estimate of how much per-process overhead we 
believe fixing that bug would eliminate. Please feel free to add blockers to 
the tracking bug if you think they're relevant, and to add or update [overhead] 
tags if you have reasonable estimates.


With all of that said, here's a brief 

Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-11 Thread Randell Jesup
>On 7/11/18 5:42 AM, David Bruant wrote:
>> I've seen this information of 100 content processes in a couple places but
>> i haven't been able to find the rationale for it. How was the 100 number
>> picked?
>
>I believe this is based on telemetry for number of distinct sites involved
>in browsing sessions.

As an example, 10 randomly chosen tabs in Chrome site isolation (a few
months ago) yielded ~80 renderers (Content processes).  Some sites
generate a lot; that list of 10 included some which likely don't
generate more than 1 or 2: google.com, mozilla.org, facebook login page,
wikipedia (might spawn a few?).

>> Would 90 prevent a release of project fission?
>
>It would make it harder to ship to users, yes...  Whether it "prevents"
>would depend on other considerations.

It's a continuum - the more memory we use, the more OOMs, the worse
we'll look (relative to Chrome), the larger impact on system perf, etc.
There's likely no hard line, but there may be a defined "we need to get
at least here" line, and for now that's 100 apparently (I wasn't
directly involved in picking it, so I don't know how "hard" it is).

We'll have to do more than just limit process sizes, but limiting
process sizes is basically table stakes, IMO.

-- 
Randell Jesup, Mozilla Corp
remove "news" for personal email
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-11 Thread Kris Maglione

On Wed, Jul 11, 2018 at 02:42:11PM +0200, David Bruant wrote:

2018-07-10 20:19 GMT+02:00 Kris Maglione :


The problem is thus: In order for site isolation to work, we need to be
able to run *at least* 100 content processes in an average Firefox session


I've seen this information of 100 content processes in a couple places but
i haven't been able to find the rationale for it. How was the 100 number
picked?


So, the basic problem here is that we don't get to choose the number of 
content processes we'll have. It will depend entirely on the number of 
origins that we load documents from at any given time. In practice, the 
biggest contributing factor to that number tends to be iframes (mostly 
for things like ads and social widgets).


The "100 processes" number was initially chosen based on experimentation 
(basically, counting the number of origins loaded by typical pages on 
certain popular sites) and our knowledge of typical usage patterns. It's 
meant to be a conservative estimate of the number of processes typical 
users are likely to hit on a regular basis, though hopefully not all the 
time.


For heavy users, we expect the number to be much higher[1]. And while those 
users typically have more RAM to spare, they also tend not to be happy 
when we waste it.


We also need to add to that number the Activity Stream process that 
hosts things like about:newtab and about:home, the system extension 
process, processes for any other extensions the user has installed 
(which will each likely need their own processes for the same reasons 
each content origin will), and the pre-loaded web content process[4].



We've been working on improving our estimates by collecting telemetry on 
the number of document groups[2] per tab group[3]:


https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=1_date=2018-06-30=__none__!__none__!__none___channel_version=nightly%252F63=TOTAL_HTTP_DOCGROUPS_PER_TABGROUP_channel_version=null=*=Firefox=0_keys=submissions_date=2018-06-25=0=1_submission_date=0

But we don't have enough data to draw conclusions yet.


Would 90 prevent a release of project fission?


This isn't really something we get to choose. The closest I can come is 
something like "would an overhead of 1.1GB prevent a release of project 
Fission". And, while the answer may turn out to be "no", I'd prefer not 
to speculate, because that's a decision we'd wind up paying for with 
user dissatisfaction.


There are some other hacks that we can use to decrease the overall 
overhead, like aggressively unloading background tabs, and flushing 
their resources. We're almost certainly going to wind up having to do 
some of that regardless, but it comes at a performance cost. The more 
aggressive we have to be about it, the less responsive the browser is 
going to wind up being. So, again, the shorter we fall on our memory 
reduction efforts, the more we're going to pay in terms of user 
satisfaction.



How will the rollout happen?
  Will the rollout happen progressively (like 2 content processes soon, 4
soon after, 10 some time after, etc.) or does it have to be 1 (current
situation IIUC) then 100?


* Andrew McCreight created a tool for tracking JS memory usage, and figuring

  out which scripts and objects are responsible for how much of it
  (https://bugzil.la/1463569).


How often is this code run? Is there a place to find the daily output of
this tool applied to a nightly build for instance?


For the moment, it requires a patched build of Firefox, so we've been 
running it locally as we try to track down and fix memory issues, and 
Andrew has been periodically updating the numbers in the bug.


I believe Andrew has been working on updating the patch to a land-able 
state (which is non-trivial), after which we'll hopefully be able to get 
up-to-date numbers from automation.



[1]: Particularly readers of TechCrunch, which regularly loads 30 
origins on a single page.

[2]: Essentially documents of different origin.
[3]: Essentially sets of tabs that are tied together because they were 
opened by things like window.open() calls or link clicks from other 
tabs.
[4]: Which currently have only one of, but may need more of in the 
future in order to support loading several iframes in a given page 
without noticeable lag or jank.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-11 Thread Andrew McCreight
On Wed, Jul 11, 2018 at 5:42 AM, David Bruant  wrote:

>
>
> * Andrew McCreight created a tool for tracking JS memory usage, and
>> figuring
>>   out which scripts and objects are responsible for how much of it
>>   (https://bugzil.la/1463569).
>>
> How often is this code run? Is there a place to find the daily output of
> this tool applied to a nightly build for instance?
>

You have to manually run this using a special build (hopefully I'll be able
to at least land code so that a special build is not needed). It isn't
clear from that description, but the focus here is on the chrome JS that is
part of the browser, rather than on websites. Reducing content process
chrome JS memory usage is going to have to be a big focus for this effort,
because I believe other browsers don't write their UI in JS, and the way
JIT stuff works it is harder to share code memory between processes than
with AOT compiled code.

If you look at about:memory, there's already a decent breakdown of how much
memory is used in JS for different things, but that doesn't help you figure
out which individual scripts are taking up memory. JSMs and content scripts
are run in only a few globals (to save memory), but that means that looking
up how much memory a global uses doesn't tell you much.


Andrew


> Thanks again,
>
> David
>
> ___
> firefox-dev mailing list
> firefox-...@mozilla.org
> https://mail.mozilla.org/listinfo/firefox-dev
>
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-11 Thread Boris Zbarsky

On 7/11/18 5:42 AM, David Bruant wrote:

I've seen this information of 100 content processes in a couple places but
i haven't been able to find the rationale for it. How was the 100 number
picked?


I believe this is based on telemetry for number of distinct sites 
involved in browsing sessions.



Would 90 prevent a release of project fission?


It would make it harder to ship to users, yes...  Whether it "prevents" 
would depend on other considerations.



Will the rollout happen progressively (like 2 content processes soon, 4
soon after, 10 some time after, etc.) or does it have to be 1 (current
situation IIUC)


Current situation is 4 processes.

How we scale up from there is TBD.

-Boris
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-11 Thread David Bruant
Thanks Kris for all this information and the beginning of the first issue
of this newsletter!

2018-07-10 20:19 GMT+02:00 Kris Maglione :

> The problem is thus: In order for site isolation to work, we need to be
> able to run *at least* 100 content processes in an average Firefox session

I've seen this information of 100 content processes in a couple places but
i haven't been able to find the rationale for it. How was the 100 number
picked? Would 90 prevent a release of project fission?
How will the rollout happen?
   Will the rollout happen progressively (like 2 content processes soon, 4
soon after, 10 some time after, etc.) or does it have to be 1 (current
situation IIUC) then 100?


* Andrew McCreight created a tool for tracking JS memory usage, and figuring
>   out which scripts and objects are responsible for how much of it
>   (https://bugzil.la/1463569).
>
How often is this code run? Is there a place to find the daily output of
this tool applied to a nightly build for instance?

Thanks again,

David
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-11 Thread Jean-Yves Avenard
Hi

That’s great info, thank you.

There’s one place where we could gain heaps is in the media stack.
Currently, each content process allocate a thread-pool with at least 8 threads 
for use with the media decoders, each threads a default stack size of 256kB.
(https://searchfox.org/mozilla-central/source/xpcom/threads/nsIThreadManager.idl#53)

That stack size has been increased over the years due to the growing use of 
either system frameworks (in particular the mac CoreVideo framework that use 
over 200kB alone), and right now 256kB itself isn’t enough for the new AV1 
decoder from libaom.

One of the work the media team has started, is to have all those decoders run 
in a dedicated process: the reason for this work was mostly done for security 
reasons, but there will be side gains memory-wise.

This work is tracked in bug 1471535 
(https://bugzilla.mozilla.org/show_bug.cgi?id=1471535)

Once this is done, and we no longer calls decoders in the content process, the 
decoder process could use an increase stack size, while reducing the content 
process default stack size to 128kB (and maybe even 64kB)

That alone may be sufficient to achieve your mentioned goals.

An immediate intermediary step could be to use two different stack sizes as we 
pretty much know which one needs more over others.

JY


> On 10 Jul 2018, at 8:19 pm, Kris Maglione  wrote:
> 
> Welcome to the first edition of the Fission MemShrink newsletter.[1]
> 
> In this edition, I'll sum up what the project is, and why it matters to you. 
> In subsequent editions, I'll give updates on progress that we've made, and 
> areas that we'll need to focus on next.[2]
> 
> 
> The Fission MemShrink project is one of the most easily overlooked aspects of 
> Project Fission (also known as Site Isolation), but is absolutely critical to 
> its success. And will require a company- and community-wide effort effort to 
> meet its goals.
> 
> The problem is thus: In order for site isolation to work, we need to be able 
> to run *at least* 100 content processes in an average Firefox session. Each 
> of those processes has its own base memory overhead—memory we use just for 
> creating the process, regardless of what's running in it. In the post-Fission 
> world, that overhead needs to be less than 10MB per process in order to keep 
> the extra overhead from Fission below 1GB. Right now, on our best-cast 
> platform, Windows 10, is somewhere between 17 and 21MB. Linux and OS-X hover 
> between 25 and 35MB. In other words, between 2 and 3.5GB for an ordinary 
> session.
> 
> That means that, in the best case, we need to reduce the memory we use in 
> content processes by *at least* 7MB. The problem, of course, is that there 
> are only so many places we can cut memory without losing functionality, and 
> even fewer places where we can make big wins. But, there are lots of places 
> we can make small and medium-sized wins.
> 
> So, to put the task into perspective, of all of the places we can cut a 
> certain amount of overhead, here are the number of each that we need to fix 
> in order to reach 1MB:
> 
> 250KB:   4
> 100KB:  10
> 75KB:   13
> 50KB:   20
> 20KB:   50
> 10KB:  100
> 5KB:   200
> 
> Now remember: we need to do *all* of these in order to reach our goal. It's 
> not a matter of one 250KB improvement or 50 5KB improvements. It's 4 250KB 
> *and* 200 5KB improvements. There just aren't enough places we can cut 250KB. 
> If we fall short in any of those areas, Project Fission will fail, and 
> Firefox will be the only major browser without site isolation.
> 
> But it won't fail, because all of you are awesome, and this is a totally 
> achievable goal if we all throw our effort behind it.
> 
> Essentially what this means, though, is that if we identify an area of 
> overhead that's 50KB[3] or larger that can be eliminated, it *has* to be 
> eliminated. There just aren't that many large chunks to remove. They all need 
> to go. And if an area of code has a dozen 5KB chunks that can be eliminated, 
> maybe they don't all have to go, but at least half of them do. The more the 
> better.
> 
> 
> To help us triage these issues, we have a tracking bug 
> (https://bugzil.la/memshrink-content), and a per-bug whiteboard tag 
> ([overhead:...]) which gives an estimate of how much per-process overhead we 
> believe fixing that bug would eliminate. Please feel free to add blockers to 
> the tracking bug if you think they're relevant, and to add or update 
> [overhead] tags if you have reasonable estimates.
> 
> 
> With all of that said, here's a brief update of the progress we've made so 
> far:
> 
> In the past month, unique memory per process[4] has dropped 3-4MB[5], and JS 
> memory usage in particular has dropped 1.1-1.9MB.
> 
> Particular credit goes to:
> 
> * Eric Rahm added an AWSY test suite to track base content process memory
>  (https://bugzil.la/1442361). Results:
> 
>   Resident unique: 
> 

Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-10 Thread Randell Jesup
>Welcome to the first edition of the Fission MemShrink newsletter.[1]

This is awesome and critical.

I'll note (and many of you know this well) that in addition to getting
rid of allocations (or making them lazy), another primary solution is to
move data out of the Content processes, and into the master process (or
some other shared process, if that's advisable for security or other
reasons), and access the data over IPC.  Or you can move it to a shared
memory block (with appropriate locking if not static).  For example, on
linux one of our worst offenders is fontconfig; Chrome for example
remotes much of that to the master process.

-- 
Randell Jesup, Mozilla Corp
remove "news" for personal email
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

2018-07-10 Thread Kris Maglione

Welcome to the first edition of the Fission MemShrink newsletter.[1]

In this edition, I'll sum up what the project is, and why it matters to you. 
In subsequent editions, I'll give updates on progress that we've made, and 
areas that we'll need to focus on next.[2]



The Fission MemShrink project is one of the most easily overlooked aspects of 
Project Fission (also known as Site Isolation), but is absolutely critical to 
its success. And will require a company- and community-wide effort effort to 
meet its goals.


The problem is thus: In order for site isolation to work, we need to be able 
to run *at least* 100 content processes in an average Firefox session. Each of 
those processes has its own base memory overhead—memory we use just for 
creating the process, regardless of what's running in it. In the post-Fission 
world, that overhead needs to be less than 10MB per process in order to keep the 
extra overhead from Fission below 1GB. Right now, on our best-cast platform, 
Windows 10, is somewhere between 17 and 21MB. Linux and OS-X hover between 25 
and 35MB. In other words, between 2 and 3.5GB for an ordinary session.


That means that, in the best case, we need to reduce the memory we use in 
content processes by *at least* 7MB. The problem, of course, is that there are 
only so many places we can cut memory without losing functionality, and even 
fewer places where we can make big wins. But, there are lots of places we can 
make small and medium-sized wins.


So, to put the task into perspective, of all of the places we can cut a 
certain amount of overhead, here are the number of each that we need to fix in 
order to reach 1MB:


250KB:   4
100KB:  10
75KB:   13
50KB:   20
20KB:   50
10KB:  100
5KB:   200

Now remember: we need to do *all* of these in order to reach our goal. It's 
not a matter of one 250KB improvement or 50 5KB improvements. It's 4 250KB *and* 
200 5KB improvements. There just aren't enough places we can cut 250KB. If we 
fall short in any of those areas, Project Fission will fail, and Firefox will be 
the only major browser without site isolation.


But it won't fail, because all of you are awesome, and this is a totally 
achievable goal if we all throw our effort behind it.


Essentially what this means, though, is that if we identify an area of 
overhead that's 50KB[3] or larger that can be eliminated, it *has* to be 
eliminated. There just aren't that many large chunks to remove. They all need 
to go. And if an area of code has a dozen 5KB chunks that can be eliminated, 
maybe they don't all have to go, but at least half of them do. The more the 
better.



To help us triage these issues, we have a tracking bug (https://bugzil.la/memshrink-content), 
and a per-bug whiteboard tag ([overhead:...]) which gives an estimate of how 
much per-process overhead we believe fixing that bug would eliminate. Please 
feel free to add blockers to the tracking bug if you think they're relevant, and 
to add or update [overhead] tags if you have reasonable estimates.



With all of that said, here's a brief update of the progress we've made so far:

In the past month, unique memory per process[4] has dropped 3-4MB[5], and JS 
memory usage in particular has dropped 1.1-1.9MB.


Particular credit goes to:

* Eric Rahm added an AWSY test suite to track base content process memory
  (https://bugzil.la/1442361). Results:

   Resident unique: 
https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-central,1684862,1,4=mozilla-central,1684846,1,4=mozilla-central,1685133,1,4=mozilla-central,1685127,1,4
   Explicit allocations: 
https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-inbound,1706218,1,4=mozilla-inbound,1706220,1,4=mozilla-inbound,1706216,1,4
   JS: 
https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-central,1684866,1,4=mozilla-central,1685137,1,4=mozilla-central,1685131,1,4

* Andrew McCreight created a tool for tracking JS memory usage, and figuring
  out which scripts and objects are responsible for how much of it
  (https://bugzil.la/1463569).

* Andrew and Nika Layzell also completely rewrote the way we handle XPIDL type
  info so that it's statically compiled into the executable and shared between
  all processes (https://bugzil.la/1438688, https://bugzil.la/1444745).

* Felipe Gomes split a bunch of code out of frame scripts so that it could be
  lazily loaded only when needed (https://bugzil.la/1467278, ...) and added a
  whitelist of JSMs that are allowed to be loaded at content process startup
  (https://bugzil.la/1471066)

* I did a bit of this too, and also prevented us from loading some other JSMs
  before we need them (https://bugzil.la/1470333, https://bugzil.la/1469719,
  ...)

* Nick Nethercote made dynamic nsAtoms allocate their string storage inline
  rather than use a refcounted StringBuffer (https://bugzil.la/1447951)

* Emilio Álvarez reduced the amount of memory the Gecko Profiler uses in
  content processes.

* Nathan Froyd fixed