Measuring interactivity in a browser (especially during load)
to enable page authors to do likewise. Options: * Jank * TTI/TTFI (or FCI) * See above for issues with it * 5-second window mostly ensures the page is actually done loading/initializing. * MID (Median Input Delay - see my posting here) * Measures expected input delay during load * Exact definition of what value to derive from the data TBD (Mean, Median, 95%, Max, etc) * Issues: * Since it’s calculating a (single) value-over-time, when you stop has a major impact on the final value. Starting time also has a major impact, since while the browser is loading the initial page data it’s likely highly interactive. * Stop on Page Ready? * Start on last byte delivered of main page? FCP? * Stop at the TTI/TTFI point? * Stop at Visually Complete (or 85%) plus some delta? Start at 10% visually complete? 30%? (and which “visually complete”?) * Graphs of expected input delay * Gives continuous readout of delay * Can works with profiler counter display * No issue defining a stop-time, though you need to stop measuring sometime. * Issues: * Doesn’t provide a single number to compare * Can provide mean/avg/95%/etc numbers, but then you have the “stopping” problem again * Could use visually complete * Page Ready * Detect when page becomes idle * Page going idle CPU-wise isn’t sufficient * Need to ignore housekeeping activity, like animations, polls, etc. * TTFI/FCI is meant to be this, more or less * All scripts of type must be ready * [All short-duration timeouts during load have run.] * Likely unnecessary In reality, we want to measure more than one thing here. We want to know when the page is ready; and we want to understand how janky the browser is during a page load (and later as well, though that’s a different question). Graphs are intuitive visually, but a graph of delay over a load is hard for automation to do something with directly, or alert on. We can track Max Delay easily, since that doesn’t have the stopping (or starting) problems. We could also track time-with-delay-worse-than-X, which is relatively free of the starting/stopping problem, but still may be slightly impacted. For example, we could have these sorts of results (hypothetical): * Foo.com: * Onload: 10.2s * TTFI: 15.4s * FCP: 3.4s * Time-To-Page-Ready: 11.2s * Max-Input-Delay: 900ms * Time-Non-Responsive(50ms): 6.5s (over period start-to-TTFI) * Time-Non-Responsive(100ms): 4.3s * TIme-Non-Responsive(250ms): 1.2s * Avg, Mean, 5%, 95% for Responsiveness (over what period?)[c] The most interesting of these to track may be Time-To-Page-Ready (TTPR), Max-Input-Delay, and perhaps Time-Non-Responsive(50ms). Proposal: We should track the input delay loading “real-world” pages such as in the load tests in Raptor, and let the tests run until TTFI or until N (10?) seconds after the load event. We should track the input delay as finely as we can (preferably 1ms), but no worse than once every ~33ms. This should be available via browsertime, the Gecko Profiler, and also in Raptor run in automation. Since the delay graph is closely tied to the specific run, we’ll want to view the series in Raptor or browsertime, as well as get the raw numbers directly. Viewing them in the profiler (as a track) is especially useful, since they’re directly tied to the code and events running. With this data, we can develop appropriate metrics that can be then alerted on or tracked. Ideally we can define a single/few alertable metrics from the data. Once we have graphed data, we can calculate what makes sense and how well this maps to reality. Initially we should try calculating several of the above metrics, and setting up some datasets were we can play with the start and end-points and what we calculate from the values. We can then start tracking some of these in perfherder and see if they’re stable enough to use as alerts. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Remove browser and OS architecture from Firefox's User-Agent string?
>On Fri, May 10, 2019 at 11:40 PM Chris Peterson wrote: >> I propose that Win64 and WOW64 use the unadorned Windows UA already used >> by Firefox on x86 and AArch64 Windows: >> >> < "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 >> Firefox/66.0" >> > "Mozilla/5.0 (Windows NT 10.0; rv:66.0) Gecko/20100101 Firefox/66.0" > >Would there be significant downsides to hard-coding the Windows >version to "10.0" in order to put Windows 7 and 8.x users in the same >anonymity set with Windows 10 users? > >(We could still publish statistics of Windows version adoption at >https://data.firefox.com/dashboard/hardware ) I wonder if any sites distributing windows executables might key off the OS version to default to the correct exe for your version of windows? If we do this, one would presume that such sites let you select alternate versions to download already. And if not, they could, but they might not bother/notice. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Lack of browser mochitests in non-e10s configuration and support for turning off e10s on desktop going forward
Jean-Yves writes: >> If anyone is chomping at the bit to remove 1proc support from their module, >> please speak up. > >I am one of those developers that find non-e10s essential to debug core >features. I've depended on using non-e10s for ages as well. Most of my debugging can in theory be done in e10s, but whenever I have to have multiple GDBs running, and deal with breaking in one causing breaks in the other (or timeouts, etc), it's really painful, and especially for any work that crosses the Content/master boundary. Also: how will Fission affect our debugging workflows? I imagine that will make things much more complicated without better tools. >Of course we can do without, but at the expense of time and >convenience. Before non-e10s support is removed, I'd love to see better >development/debugging tools, particularly on Windows added to help our >workflow. +100 -- we need better tools for e10s debugging, especially with Fission coming. We can get away without them, and without non-e10s - but it will be more friction that lots of us will have on a daily basis. [Part of the problem is that lots of debugging tools don't deal with, or deal poorly with, multi-process apps. No revelation here, of course.] -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Announcing ./mach busted
>TL;DR - In bug 1543241 (alias 'mach-busted'), we now have a central >clearinghouse of bugs for broken mozilla-central tooling. You can invoke >|./mach busted| to get a list of known outstanding issues, and |./mach >busted file| to report a new one. In case it's not obvious (it wasn't to me), 'file' is a keyword, not a file to include, and all you really need to do is file a bug that blocks bug 1543241 (or add that to an existing bug). We also should name that bug, for easy in referencing in the future (I propose "mach-busted" ;-) ) -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
PSA on MOZ_LOG entries in profiles
A short PSA about MOZ_LOG() messages: In Bug 1518030 back in January, I landed support for mirroring MOZ_LOG() messages into profiles captured with the Gecko Profiler (https://profiler.firefox.com/ formerly https://perf-html.io/). You can now add "profilermarkers" to you MOZ_LOG env variable to cause log messages to be mirrored as profiler Markers in profiles. I would suggest being a little careful about enabling excessive number of log messages when using this, and it does allocate memory for each marker so the overhead may be more than normal (a bit). That said, you can probably throw a lot of logs at it safely. You can find them in the Marker Chart or Marker Table when viewing a profile. Now back to your regularly scheduled programming ;-) -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Running different instances of Firefox side-by-side
>-no-remote and -new-instance still exist. Right now they do the same thing, >they make Firefox not look for existing instances and not listen for >remoting from future instances. They are pretty pointless now though, the >only case where they would have an effect is when a second instance is >trying to use a profile that is already used by an existing instance ... at >which point we'll show the profile locked dialog on startup and refuse to >startup anyway. [snip] >The other thing that might be confusing is that the version or install of >Firefox you try to launch doesn't affect which version or install of >Firefox you might end up remoting to. This has always been the case on >Windows and normally the case on Linux, unless you pass an extra command >line argument though so I'm not too concerned here. -no-remote is still quite useful when debugging; if I build in one of my dev repos, and then ./firefox -P test -no-remote, it will warn me if I have another instance using that profile (quite possibly from a different repo/directory) instead of silents loading it in that other instance - which can be very confusing if you're trying to test a fix. ("I *swear* I made that change; why didn't it take?") >Hopefully this all makes sense. I'd like to hear if folks think that this >is the wrong way to support this and if you spot any issues with it that I >haven't. Thanks for doing this; the current system kinda fell out of various needs mostly around 20+ years ago and hadn't been revisted since then really. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: try pushes: use a base revision from 2019-02-06 or newer
>On 2/12/19 9:15 PM, Randell Jesup wrote: >>> if you push to the Try server, use base revisions (= the shared revision on >>> top of which you add your changes) from 2019-02-06 or newer, else there >>> will be many test failures due to expired certificates. The newer base >>> revisions have the fixes from >>> https://bugzilla.mozilla.org/show_bug.cgi?id=1525191 >> >> What if you need to push a Try involving an older rev (i.e. to verify a >> problem or fix)? Is there a patch you can add on top of an older rev to >> avoid failures? >> >> Has the ESR branch been updated so it can be used as a base as well? > >ESR was fixed with >https://hg.mozilla.org/releases/mozilla-esr60/rev/2a0b77c6fa9b >Unfortunately there isn't a preexisting single patch you can use to >update the certificates in mozilla-central, but depending on the tests >you're running, using some or all of the "original commit"s mentioned in >https://bugzilla.mozilla.org/show_bug.cgi?id=1525191#c26 should work. Could you pull them together into one patch, and add to that bug? That would make it a lot easier for people in the future if they need to verify something. Unless you think it would be easy for people if they need to do it. Thanks! -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: try pushes: use a base revision from 2019-02-06 or newer
>if you push to the Try server, use base revisions (= the shared revision on >top of which you add your changes) from 2019-02-06 or newer, else there >will be many test failures due to expired certificates. The newer base >revisions have the fixes from >https://bugzilla.mozilla.org/show_bug.cgi?id=1525191 What if you need to push a Try involving an older rev (i.e. to verify a problem or fix)? Is there a patch you can add on top of an older rev to avoid failures? Has the ESR branch been updated so it can be used as a base as well? -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Process Priority Manager enabled on Nightly for Windows (only)
>Hey Valentin, > >That's a good question. I haven't yet noticed any difference yet, but >I'm hoping people can keep an eye out to see if there's appreciable lag >when switching back to a tab with video, or if background audio starts >to stutter. Also this could impact webrtc calls or perhaps webrtc-based datachannel applications (file transfers, games, etc, though likely if it's just priority this won't be a problem). -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: New and improved "about:config" for Firefox Desktop
>On 1/26/2019 10:09 AM, khagar...@gmail.com wrote: >> Does it take into account that the sorting is preserved between about:config >> calls? > >No, but 0.4% is still very low. We could imagine that a lot of people >keep the table sorted by type at all times, or that only a few people >do or even know that they can sort, depending on where our confirmation >bias stands. We're aware of this, and this is why this data point is >definitely not the only element that will influence the direction here. I frequently wished to see (only) modified prefs... and never realized I could sort on modification status. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Proposal to adjust testing to run on PGO builds only and not test on OPT builds
>* do we turn off builds as well? I had proposed just the tests, if we decide >to turn off talos it would make sense to turn off builds. Would turning off opt builds cause problems if you want to mozregression an opt build? And would this be an issue? (obviously it might be for opt-only failures, or trying to verify if a regression identified in mozregression for PGO was a PGO bug or now, though that could be checked at the cost of a build or 4 even if we don't build opt, probably). -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: The upcoming C/C++ coding style change
>>If I'm understanding your situation correctly, I believe you can use rebase >>to update a whole queue of dependent patches, provided you have the >>format-source extension enabled. Ok. For anyone with mq patch queues, here's a sequence that should work. I hit a problem, and will file a bug on clang-format - the sequence below side-steps the bug. I found when I reviewed my queues in more detail that I only have maybe 6 active queues (mostly for specific projects) with perhaps 50-100 active patches; a number of other queues for areas I'm not actively working on at the moment (like my webrtc datachannels queue), and quite a few WIP queues for work I abandoned or decided to mothball, or for completed work (like webrtc import patch queues). Most of those date back to 2016 or earlier. At times in the past I've had multi-hundred patches for a few dozen active queues. That would have made this issue more problematic, time-wise. (Note: this sequence below could probably be automated. It's not worth my time to do so, however.) # First, ensure the tools are set up: # do these once; no need to do them again ./mach bootstrap ./mach clang-format # that forces clang-tidy/etc to be downloaded and setup! (bug) # Now here's how to reformat your patches: hg pull -u hg tip # record rev of tip of tree Now for each set of patches that can be applied together (assuming they're at 0-N of the current queue; if not adjust accordingly): hg qshow 0 | head 10 # Use this to find the parent rev this sequence was applied on. # In some cases that rev is gone (a patch that has since been pushed), # If it's not found, use PRE_TREEWIDE_CLANG_FORMAT instead. hg update -r hg qpush # resolve conflicts if any and hg qref # repeat hg qpush/resolve for all patches in the sequence (N) hg qser >/tmp/patch_names hg qfin -a # if not already on PRE_TREEWIDE_CLANG_FORMAT: hg rebase -d PRE_TREEWIDE_CLANG_FORMAT # resolve any conflicts before we reformat hg rebase -d # resolve conflicts # repeat N times: hg qimport -r tip -n # -n name is optional but recommended hg qpop Poof! that *should* be all. Thanks to jkew, pehrsons, ehsan for suggestions! -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: The upcoming C/C++ coding style change
>> tl;dr: I need to figure out how I'm going to dig out of the rebasing hole >> I'm now in, and could use advice/help, and I think the above sequence >> doesn't work for queues of dependent patches. > >If I'm understanding your situation correctly, I believe you can use rebase >to update a whole queue of dependent patches, provided you have the >format-source extension enabled. [snip] Thanks - that was the sequence I thought might work, after hearing from pehrsons that the rebase would reformat all the pushed patches. hg qfin -a is critical I think. If I wanted to optimize this, perhaps for each independent sequence: hg update -r ; hg qpush (*N); hg rebase -d PRE_TREEWIDE_CLANG_FORMAT (assuming this does what I expect); resolve any bitrots; hg rebase -d ; (hg qimport -r tip; hg qpop) (*N) Almost automatable (probably automatable; may not be worth it). Thanks! I'll let people know if it works -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: The upcoming C/C++ coding style change
>But still all is not lost here. When you do decide to do the manual >merging when you needed those patches, you would need to: > > * Update your working tree to the parent of the commit that did the >reformat. > * Apply your patch to that tree and reformat the tree. > * Diff the reformat commit and your current working directory. That >would give the reformatted diff. tl;dr: I need to figure out how I'm going to dig out of the rebasing hole I'm now in, and could use advice/help, and I think the above sequence doesn't work for queues of dependent patches. So in my 6 (current) repos on my main dev machine, I have ~150 mq patch queues with a total of ~3000-3500 patches (though perhaps only 2500 unique ones) - some ancient/abandoned, but most have (unfinished) work on some subject (not generally single bugs, but something like "datachannel", with some related patches and some independent patches, and usually with a tail of some temporary/abandoned/replaced patches). Perhaps not an optimal flow, but it has worked for me for many years. Some of these are under revision control (hg mq --init), and are synced with (shared) patch repos on hg.mozilla.org (for example webrtc.org update import patch queues, which typically got to 100-150 patches before squashing for landing). I also used that to share queues with my windows dev machine. Some queues I could drop, but probably the majority of queues and majority of patches within each queue I'll want to save, if I can. The "saving" could be done on an as-needed basis, they don't need to all be converted (and that helps with the issue of having to decide if they're still relevant). But I need a process that won't take crazy amounts of manual labor to save my patches. I'd guess I have about 1000 patches I may need to save, eventually, and several hundred in the short term. An example is a queue of work to redo Dispatch() to require an already_AddRefed<>; that has about 30 directory specific patches, hitting ~250 files (plus some cruft patches after those). Another is my profiler work queue, which has 42 patches, mostly for single issues (or 2 or 3 related patches for an issue), and perhaps 1/2 of them are still relevant. Dealing with normal bitrot when going back to one of these I'm used to, but the reformatting will make pretty much every patch be 100% conflicts I assume. So what are reasonable options here? Are any of the options an improvement on "deal with every patch totally manually with 100% conflicts"? (Which, over time as I go back to various queues, will use a lot of my time doing manual rebasing.) Note that the line above "Apply your patch to that tree" implies doing a manual un-bitrot if needed (like updating today), which is ok (as I said, it's the same as before the reformat; most will not need major surgery unless they're ancient). >From the discussion here, it sounds like some manual fixups might be needed after the tool runs as well. But doing this sequence for every patch sounds time-consuming. And a much bigger issue: I think that breaks down for a queue of patches that depend on each other - if I reformat the first patch, I can't go back to before the big reformat and just apply the second patch, as it may depend on the first. I'd have to (I guess) squash all the patches, which in most cases would be horribly worse than doing it 100% manually. So what are usuable options for a (long) queue of patches, which may be dependent? (Some aren't and so perhaps I could do the sequence above for each one, like the Dispatch queue with per-directory patches, but that's not the common case.) Can we come up with a way to partially script this, if there's a workable sequence? Or do I just have to do a ton of manual rebasing? Thanks for any help/ideas. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Performance profiling improvements #3
>I think sudo will let you have symbolicated kernel stacks which can be handy. $ sudo perf record ./firefox has a problem: "Running Nightly as root in a regular user's session is not supported." You can work around this with either: $ sudo perf record -g -F 999 -p (this only gets one process of firefox's N processes, though) or $ sudo perf record -g -F 999 sudo -u ./firefox -P whatever -no-remote However: you may find that the extra detail doesn't help much. Normally, I just use user-space profiling: $ perf record -g -F 999 ./firefox -P my-profile -no-remote $ perf script -F +pid > /tmp/firefox.perf Note that you'll want /proc/sys/kernel/perf_event_paranoid to be 1 (or less). To persist across reboots: sudo sh -c 'echo kernel.perf_event_paranoid=1 >> /etc/sysctl.d/99.local.conf' (or add to /etc/sysctl.conf, etc - whatever is correct for your system) -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to implement and ship: WebP image support
>Are we bringing in a new third party library for this? (Seems like yes?) libwebp (see https://bugzilla.mozilla.org/show_bug.cgi?id=1294490) >Who else uses it/audits it? Does anyone else fuzz it? Is it in OSS-fuzz? >Are we fuzzing it? http://developers.google.com/speed/webp - Chrome uses it. They fuzz it (including with private fuzzing). It's in OSS-fuzz: see https://groups.google.com/a/webmproject.org/forum/#!topic/webp-discuss/aqHRxQqJpH0 I don't believe we're fuzzing the patches yet, but I imagine we will. >How does upstream behave? Do they cut releases or do they just have >continual development and downstreams grab random versions of it? How do we >plan to track security issues upstream? How do we plan to update it >(mechanically and how often)? You can see how they handle releases above. Version 1.0.0 was cut in April (though there were a number before then). See https://chromium.googlesource.com/webm/libwebp I don't know how they track sec issues; probably similar to other google/chrome/chromium projects. See https://bugs.chromium.org/p/webp/issues/list You can report issues as "Security" issues. > bz wrote: >> In the past, I believe we objected to adding WebP for various reasons. >> Do we feel that those reasons are now outweighed by the compat problems? (Personal opinion) Yes, unfortunately. And AV1F image format both isn't ready and isn't universally supported; it will take a while. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Better incremental builds - Tup backend available on Linux
>Tup is a modern build system that enables fast and correct builds. I'm >pleased to announce the availability of Tup for building Firefox on Linux. >Compared to Make, building with Tup will result in faster incremental build >times and reduce the need for clobber builds. See the in-tree docs for >details [1]. So this seems to be faster for "./mach build" if you have small changes to c++ code . However, "./mach build binaries" doesn't work at all. (Perhaps with this it's no longer necessary, but if so a message stating that would be good). >cd ~/.mozbuild && mach artifact toolchain --from-build linux64-tup This doesn't work... no "mach" in my path. Going to a tree and typing './mach artifact toolchain --from-build linux64-tup' works though. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Input Delay Metric proposal
>For in-the-wild input delay not specifically during pageload we also >measure INPUT_EVENT_RESPONSE_DELAY_MS[1] and (from Firefox 53-60 and then >resuscitated in 64) INPUT_EVENT_RESPONSE_COALESCED_MS[2] which record[3] >from the time the OS created the input event until the time when the >process is done with the event. Sure, those are interesting... though it's the tail that's generally the concerning part here - and most of the tail will be during-pageload. (and those that aren't can actually be interesting for other reasons). MID is intended for automation and to a lesser extent for investigation of responsiveness on specific pages, or for devtools (so developers can see how their page is working). It's arguable how useful this is in telemetry - though I think it might be worth tracking there, at least for a while to compare. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Enabling (many) assertions in opt builds locally and eventually Nightly
>On 9/20/18 5:59 PM, Andrew McCreight wrote: >> On Wed, Sep 19, 2018 at 5:44 PM Kris Maglione wrote: >> >>> On Wed, Sep 19, 2018 at 05:37:46PM -0700, Bobby Holley wrote: >>>> So, I don't think we need to do anything fancy with forking - we'd just >>>> need to capture stacks and send them via telemetry rather than as a crash >>>> report. This was the idea behind bug 1209131, which got pretty far along >>>> but never made it to the finish line. >>> >>> This would actually potentially even give us better information >>> than fork-and-crash, since we should be able to include JS >>> stacks in that setup too. We've never been able to do that in >>> ordinary crash reports, since breakpad doesn't know how to >>> unwind JS stacks, and we can't safely ask the JS runtime to do >>> it while we're crashing. >>> >> >> Though keep in mind that any stack that includes content JS is going to >> likely count as PII, so it would have to be hidden by default on Soccorro. > > >Please note that it would be illegal to collect such data >without asking for explicit user consent first. >The GDPR requires a "positive opt-in" mechanism: >https://ico.org.uk/for-organisations/guide-to-the-general-data-protection-regulation-gdpr/lawful-basis-for-processing/consent/ >Our current Telemetry permission is an opt-out mechanism. Right - this would need to be handled in a similar way to real crashes - pop a crashreporter dialog to let the user submit it. We just wouldn't kill the browser (and probably disable future semi-assertions until restart once we hit and report one to avoid bugging the user too much). -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Enabling (many) assertions in opt builds locally and eventually Nightly
>On Thu, Sep 20, 2018 at 09:22:12AM +1000, Cameron McCormack wrote: >> On Thu, Sep 20, 2018, at 1:52 AM, Ehsan Akhgari wrote: >> > While it may be the case that we may need to be more stable for >> > MOZ_RELEASE_ASSERTs, I very much doubt that every single MOZ_ASSERT in our >> > codebase is actually a guaranteed to never fail, so promoting them all to >> > be enabled in something like Nightly is extremely dangerous in terms of the >> > stability of Nightly for users who are trying to use the browser to get >> > their normal work done. >> >> If it's truly useful to know whether Nightly users are failing these >> MOZ_ASSERT assertions, we could use Telemetry to report their failure >> instead of crashing.> >> (I wonder if we could collect all the same data, and use the same crash >> reporting infrastructure, for non-crashing crash reports like this.) > >On Linux and Mac, we could make MOZ_ASSERT fork and crash in the child, >and continue in the parent process. Oooh, tricky. (though threads aren't preserved in forks, but the current thread's stack is, so if we care about all threads, we might need to grab pointers to their stacks/etc and store pointers to the stacks before fork()). But even without that it would be good. And yes, we don't have to have it crash the browser (though if it later crashes we'd want to know about assertions we'd hit). Telemetry isn't quite appropriate for reporting it; it could however save a crash report at ASSERT(), then (probably) disable MOZ_ASSERT reports and let the browser continue (and it may well now really crash). This disabling would avoid the cascade effect, but still get us crashreports. This would be a bunch(?) of work in crashreporter - ted? It might be easy invoke a crashreport sampling, and continue, but I suspect it's not, and might be quite hard. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Enabling (many) assertions in opt builds locally and eventually Nightly
Ehsan wrote: >On Tue, Sep 18, 2018 at 1:30 AM Randell Jesup wrote: >> We already *need* to be stable in that case, given MOZ_RELEASE_ASSERT >> (and normal just-because crashes). And as best I can tell, we are stable >> (with regards to user profiles). Once upon a time we weren't (5(?) >> years ago?) > >I do come across MOZ_ASSERTs that trigger while browsing every once in a >while (using a debug build). I almost never file bugs for these, because >the issues are rarely reproducible, I often have little information that >would be helpful as to how exactly the assertion was triggered, and I often >am very actively working on something else and the assertion failure is >just getting in the way of me getting my work done. I sometimes have to >comment out a broken MOZ_ASSERT in my local build to be able to proceed. Good point, though if we have people actually browsing with these, we'll find these bogus MOZ_ASSERTs - and a bogus MOZ_ASSERT may mean we have a sec bug that we assume "can't happen", since it doesn't fail in mochitests. >While it may be the case that we may need to be more stable for >MOZ_RELEASE_ASSERTs, I very much doubt that every single MOZ_ASSERT in our >codebase is actually a guaranteed to never fail, so promoting them all to >be enabled in something like Nightly is extremely dangerous in terms of the >stability of Nightly for users who are trying to use the browser to get >their normal work done. Again, good point -- though I would absolutely start with "get developers to use this". If we're kicking a MOZ_ASSERT, someone should be either fixing it or removing the assert and dealing with it. So please file against whomever is maintaining that bit of code, and let them sort out why it would fail. >This is, I believe, the way that we advertise >Nightly these days: we advertise it as a mostly stable experimental version >of Firefox and we encourage people to download it to try out the latest and >greatest. Exposing that population to crashes as a result of promoting all >MOZ_ASSERTs to become MOZ_RELEASE_ASSERTs seems like a proposition that >needs to be backed by some data demonstrating that such a build would match >the current stability requirements of Nightly. Sure. >FWIW, I think your original proposal, having a way to opt into assertions >without a slow build locally, is extremely valuable. I could see myself >using that option intentionally even with the pain points I described >above, but for dogfooding rather than while working on a patch and such. Cool, thanks. I have the start of a patch to enable a version of this. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Input Delay Metric proposal
Problem: Various measures have been tried to capture user frustration with having to wait to interact with a site they're loading (or to see the site data). This includes: FID - First Input Delay -- https://developers.google.com/web/updates/2018/05/first-input-delay TTI - Time To Interactive -- https://developers.google.com/web/fundamentals/performance/user-centric-performance-metrics#time_to_interactive related to: FCP - First Contentful Paint and FMP - First Meaningful Paint -- https://developers.google.com/web/fundamentals/performance/user-centric-performance-metrics#first_paint_and_first_contentful_paint TTVC (Time To Visually Complete), etc. None of these do a great job capturing the reality around pageload and interactivity. FID is the latest suggestion, but it's very much based on watching user actions and reporting on them, and thus depends on how much they think the page is ready to interact with, and dozens of other things. It's only good for field measurements in bulk of a specific site, by the site author. In particular, FID cannot reasonably be used in automation (or before wide deployment). Proposal: We should define a new measure based on FID name MID, for Median Input Delay, which is measurable in automation and captures the expected delay a user experiences during a load. We can run this in automation against a set of captured pages, while also measuring related values like FCP and TTI, and dump this into a set of per-page graphs (perhaps on "areweinteractiveyet.com" :-) ). While FID depends on measuring the delay when the user *happens* to click, MID would measure the median (etc) delay that would be experienced at any point between (suggestion) FCP and TTI. I.e. it would be based on "if a user input event were generated this millisecond, how long would it be before it ran?" This would measure delay in the input event queue (probably 0 for this case) plus the time remaining until he current-running event for the mainthread finishes. This inherently assumes we measure TTI and FCP (or something approximating it). This is somewhat problematic, as TTI is very noisy. I have a first cut at TTI measurement (fed into profiler markers) in bug 1299118 (without the "no more than 2 connections in flight" part). Value calculation: Median seems to be the best measure, but once we have data we can look at the distributions on real sites and our test harness and decide what has the most correlation to user experience. We could also measure the 95% point, for example. In automation, there might be some advantage to recording/reporting more data, like median and 95%, or median, average, and 95%, and max. Another issue with the calculation is that it won't capture burstiness in the results well (a distribution would). Range measured over: We could modify the starting point to be when the first object that could be interacted with is rendered (input object, link, adding a key event handler, etc). This would be a more-accurate measure for web developers, and would matter only a little for our use. Note that getting content on the screen earlier might in some cases hurt you by starting the measurement "early" when the MainThread is presumably busy. Likewise, there might very well be alternatives to TTI for the end-point (and on some pages, you never get to TTI, or it's a Long Time). Using TTI does imply we must collect data until 5 seconds after the last "Long Task", and since some sites will never go 5 seconds without a long task, we'll need to upper-bound it (or progressively reduce the 5 seconds over time, which may help). Alternatively, we could use a shorter window, or put an arbitrary limit on it (5 seconds past 'loaded', or just to 'loaded'), etc. Issues: Defining the start and stop point, and the details around the exact way we calculate the result (I hand-wove about it above). Note that "longer" endpoints will result generally in better scores, since it would average over probably a longer tail where less is happening (presumably). OTOH if it ends at TTI on a "Long Task" (50+ms event), that rather implies that it was at least intermittently busy until then. If we want to start when something interact-able is rendered, there may be some work to figure that out. Note that this inherently is measuring the delay until the input event *starts* processing, not how long it takes to process (since there is no actual input event here). Once we have some experience with this, we could propose it for the Performance API WG. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Enabling (many) assertions in opt builds locally and eventually Nightly
>On 9/17/2018 1:31 PM, Botond Ballo wrote: >> One potential issue here is that an assertion failure in Nightly >> builds might be simultaneously very annoying (because it crashes the >> browser of Nightly users every time it occurs) and not very actionable >> (because no reliable STR can be found for it). > >I would also be concerned about data loss and profile corruption when an >assertion crashes the browser. I don't think we'll have a very easy time >convincing people to do anything non-trivial with Nightly if there's a >pretty good chance that they completely lose the fruits of their labour. > >I'd hope that this wouldn't be too frequent a problem, but it probably >depends on the assertion, and the state of the durability of our user >profiles. We already *need* to be stable in that case, given MOZ_RELEASE_ASSERT (and normal just-because crashes). And as best I can tell, we are stable (with regards to user profiles). Once upon a time we weren't (5(?) years ago?) -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Enabling (many) assertions in opt builds locally and eventually Nightly
It may be possible to narrow down the performance-regressing MOZ_ASSERTs by pushing a series of Talos try runs with MOZ_ASSERT enabled for specific top-level directories - then take those with significant regressions, and recurse down a layer, lather, rinse, repeat to at least the leaf-dir level, or (if there are a lot in the dir) the file level. Then we can look with eyeballs and figure out what's up. It might also be possible to use trace APIs to measure the cost of each assertion... though the traces themselves might swamp most assertion costs. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Enabling (many) assertions in opt builds locally and eventually Nightly
There are quite a few things that may be caught by assertions by developers before committing, especially sec issues - but most developers don't run debug builds as their main local test builds, or for local browsing use, because they are almost unusably slow. And turning on compiler optimization doesn't actually help much here - the problem is mostly debug-only assertions and code that's debug only, such as bug 1441052 ("Don't do full grey-node checking in local debug builds"). These added checks may be useful for CI tests, but they make the builds unusable, so the vast majority of assertions don't run in the builds our developers are using. So enabling most of the existing MOZ_ASSERTs for local opt builds (minus the worst performance-limiting ones) would help developers catch bugs before landing them (and perhaps shipping them). It's a lot like using ASAN builds to browse - but with much less perf degradation on hopes. Even better, if we can make the overall hit to perf low enough (1%?), we could enable it for some/all Nightly users/builds. Or make "early nightly" (and developer builds) enable them, and late nightly and beta not. If we moved some of the most expensive checks to an alternative macro (MOZ_ASSERT_DEBUG()), we could (selectively?) enable MOZ_ASSERT checks in some opt builds. Alternatively we could promote some MOZ_ASSERTs to MOZ_ASSERT_NIGHTLY() (or MOZ_DIAGNOSTIC_ASSERT), which would assert in (all) debug builds, and in nightly opt builds - and maybe promote some to MOZ_ASSERT_RELEASE if we can take the perf hit, and they're in an important spot. And as a stepping-stone to the above, having local opt builds enable (most) MOZ_ASSERTs (excluding really expensive ones, like bug 1441052) even if the hit is larger (5, 10%) would also increase the likelihood that we'll catch things before we land them, or before they get to beta. (Note: --enable-release would turn this local-build behavior off - and anyone doing speed tests/profiling/etc needs that already, and probably we should have a specific enable/disable like --disable-extra-assertions). This wouldn't be all that hard - most of the work would be finding "expensive" assertions like bug 1441052. (and perhaps we should not continue to overload --enable-release for "I want to benchmark/profile this build") Enabling most MOZ_ASSERTs in automation tests on opt builds would also slightly increase our odds of finding problems - though this may not help much, as the same tests are being run in debug builds. The only advantage would be races may be more likely in the faster opt builds. It might be worth trying once we have this for a few weeks, and see if it helps or not. Note: I've discussed this concept with various people including bz in the past, and mostly have had agreement that this would be useful - thus my filing of bug 1441052. If we agree this is worth doing, we should file bugs on it and see what the effort to get this would be, and if we could ship some of this to users. Much like nightly-asan, this would shake out bugs we'll never find from crash-stats reports, and which can lead to critical sec bugs, and turn them into frequently-actionable bugs. If needed this can be enabled incrementally once the build/config infra and macros are in place; there are several options for that. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: C++ standards proposal for a embedding library
Ted wrote: >Honestly I think at this point growth of the C++ standard library is an >anti-feature. The committee should figure out how to get modules specified >(which I understand is a difficult thing, I'm not trying to minimize the >work there) so that tooling can be built to provide a first-class module >ecosystem for C++ like Rust and other languages have. The language should >provide a better means for extensibility and code reuse so that the >standard library doesn't have to solve everyone's problems. (and various others wrote things along the same lines, or similar concerns) I'm strongly in agreement. This is not going to help people in the long run, and may well be as Ted puts it an anti-feature that causes all sorts of problems we've chipped away at to re-rear their heads (ossified unsafe impls, lack of any improvements, etc). A good module system is a much more useful and liberating thing to do. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: PSA: pay attention when setting multiple reviewers in Phabricator
>>> A better solution would be to have in-tree metadata files providing >>> subscription rules for code review (e.g. a mapping of usernames to a list >>> of patterns matching files). Module owners would be responsible for >>> reviewing changes to these rules to ensure that automatic delegation >>> happens to the correct people. >> I still don't believe this would work well in practice. It would work, >> but would have frequent problem cases and often require a lot of updating. >at the end of the day tooling needs to ensure that reviewer has the >authority to approve changes. Well, that's an assertion. I'm not sure that the tools *needs* to do more to ensure it than today; I believe that some people believe it *should* do more. And perhaps they're right. >i don't think the issues you've raised are show stoppers; it's certainly a >system we will iterate on over time. >keep in mind _how_ we work is also something we should be iterating on too. Agreed. And if something can work better, great; but I'm also leery of adding a lot of process or gatekeepers, especially when you need to touch a module with few official peers, who might not be available when you need a rubber stamp of a nit fix. (Initial review has nits; revise, now the reviewer is on PTO or simply away for the weekend or a holiday and you have a big landing gated on this, or a sec issue, or will miss a release if you can't land before uplift/soft-freeze, etc, etc, etc) An alternative is more guidelines (and training, especially for new devs) for reviewers/peers/devs to follow, and deal with any oversteps on a case-by-case basis, and if there are continual issues then add enforced process. Right now it's kinda learn-by-watching-other-devs - which works, but makes it hard to be sure you learned the right things. >the current system is extremely coarse - everyone with scm-level-3 can >approve any change tree-wide. >there is a strong desire to make this finer grained I believe that; though I'd be interested to know with who, the reasons, and the observed problems, and what other solutions have been considered. Of course, I don't need to know these; I'm just interested as while I've seen occasional fubars, I haven't seen persistent problems that would require that (especially over improved guidelines). Also, for example, one could add these gate-checks with overrides that devs can use at their discretion (like for nits). I'm sure added checks can avoid some fubars - but I'm also sure it will cause added friction to development, extra load on some people, and slow some things down. >> [snip] >> Some modules (i.e. DOM) already to have a hard requirement for peer >> review. That should be continued and should be enforced as it is now, >> and it sounds like Lando will (does?) support that. If another module >> wants to enforce it we can flip a bit in the list of peers and have it >> enforced. >the enforcement you're referring to is controlled by a hardcoded list of >peers inside a mercurial hook. > >this doesn't scale anywhere close to our needs, and all of the exact same >concerns you raise with regards to in-tree ownership tracking applies >(however it would be worse as it's imposed by a system separate from the >review/landing tooling). Agreed. Moving those to in-tree lists is certainly a win over current. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)
>On 13/07/2018 04:55, Randell Jesup wrote: >> Correct - we need to have observers/what-have-you for >> background/foreground state (and we may want an intermediate state or >> two - foreground-but-not-focused (for example a visible window that >> isn't the focused window); recently-in-foreground (switching back and >> forth); background-for-longer-than-delta, etc. >> >> Modules can use these to drop caches, shut down unnecessary threads, >> change strategies, force GCs/CCs, etc. >Also note that dealing with the "importance" of a page is not just a >matter of visibility and focus. There are other factors to take into >account such as if the page is playing audio or video (like listening to >music on YouTube), if it's self-updating and so on. Absolutely >The only mechanism to reduce memory consumption we have now is >memory-pressure events which while functional are still under-used. We >might also need more fine grained mechanisms than "drop as much memory >as you can". This is also very important for GeckoView -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)
>On Thu, Jul 12, 2018 at 08:56:28AM -0700, Andrew McCreight wrote: >>On Thu, Jul 12, 2018 at 3:57 AM, Emilio Cobos Álvarez >>wrote: >> >>> Just curious, is there a bug on file to measure excess capacity on >>> nsTArrays and hash tables? [snip] >I kind of suspect that improving the storage efficiency of hashtables (and >probably nsTArrays too) will have an out-sized effect on per-process >memory. Just at startup, for a mostly empty process, we have a huge amount >of memory devoted to hashtables that would otherwise be shared across a >bunch of origins—enough that removing just 4 bytes of padding per entry >would save 87K per process. And that number tends to grow as we populate >caches that we need for things like layout and atoms. Hash tables are a big issue. There are a lot of 64K/128K/256K allocations at the moment for hashtables. When we started looking at this in bug 1436250, we had a 256K, ~4 128K, and a whole bunch of 64K hashtable allocs (on linux). Some may be smaller or gone now, but it's still big. I wonder if it's worth the perf hit to realloc to exact size hash tables that are build-once - probably. hashtable->Finalize()? (I wonder if that would let us make any other memory/speed optimizations if we know the table is now static.) -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: PSA: pay attention when setting multiple reviewers in Phabricator
>On 05/07/2018 18:19, Mark Côté wrote: >> I sympathize with the concerns here; however, changing the default would >> be a very invasive change to Phabricator, which would not only be complex >> to implement but troublesome to maintain, as we upgrade Phabricator every >> week or two. Makes sense. >> This is, however, something we can address with our new custom >> commit-series-friendly command-line tool. We are also working towards the >> superior solution of automatically selecting reviewers based on module >> owners and peers and enforcing this in Lando. > >Automatically selecting reviewers sounds like a huge improvement, >particularly for people making changes who haven't yet internalised the >ownership status of the files they are touching (notably any kind of >first-time or otherwise infrequent contributor to a specific piece of >code). So I'm very excited about this change. Agreed that this is an improvement for those who are new to mozilla development, though the bugzilla reviewer suggestions are also useful today. This would be the equivalent of adding "Select for me" in Bugzilla. >That said, basing it on the list of module owners & peers seems like it may >not be the right decision for a number of reasons: > >* The number of reviews for a given module can be very large and being >unconditionally selected for every review in a module may be overwhelming. > >* The list of module owners and peers is not uniformly well maintained (in >at least some cases it suggests that components are owned by people who >have not been involved with the project for several years). Although this >should certainly be cleaned up, the fact is that the current data is not >reliable in many cases. This can be cleaned up (and is being so), but will never be perfect. Also, unless you're amazingly careful, this will often route reviews to people on PTO, sick, in the "wrong" time zone, on a holiday day, or simply too busy to review in a reasonable period due to whatever. Some of these are known ahead of time, others aren't. And then there are the "I need a review now to fix something" where the automatic selection might be someone who won't be available to review it for 12 hours, or 3 days - or 3 weeks, requiring noticing this and having someone manually correct it. >* Oftentimes there is substructure within a module that means that some >people should be reviewers in certain files/directories but have no >knowledge of other parts. Or you spend a lot of time deciding and updating who can review what bits - and it doesn't always match the directory heirarchy! >* It usually desirable to have people perform code review for some time as >part of the process of becoming a module owner or peer. Absolutely. >A better solution would be to have in-tree metadata files providing >subscription rules for code review (e.g. a mapping of usernames to a list >of patterns matching files). Module owners would be responsible for >reviewing changes to these rules to ensure that automatic delegation >happens to the correct people. I still don't believe this would work well in practice. It would work, but would have frequent problem cases and often require a lot of updating. In-tree metadata to support suggestions (and to support "select for me" when someone explicitly asks) is good, I think. Note that not all modules are directly related to directories in the tree, so something out-of-tree or some escape valve in-tree is needed to record those. I think automatically selecting reviewers always (as is implied, but it isn't clear that this is the plan - Mark?) is quite problematic, and enforcing reviewers-are-listed-in-tree also has real problems (and may require too-frequent manual intervention) such as not supporting peers assigning reviews to non-peers to train them (to be peers), for cases where a the best person to review a patch isn't a full module peer but an SME on the issue/code in question, etc. These problems can be gamed/wallpapered (assign a review to both the non-peer and a peer who doesn't really review or reviews-the-review, etc), but that adds pain and time and unneeded process. Some modules (i.e. DOM) already to have a hard requirement for peer review. That should be continued and should be enforced as it is now, and it sounds like Lando will (does?) support that. If another module wants to enforce it we can flip a bit in the list of peers and have it enforced. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)
>On 07/12/2018 11:08 PM, Randell Jesup wrote: >> We may need to trade first-load time against memory use by lazy-initing >> more things than now, though we did quite a bit on that already for >> reducing startup time. > >One thing to remember that some of the child processes will be more >important than others. For example all the processes used for browsing >contexts in the foreground tab should probably prefer performance over >memory (in cases that is something we can choose from), but if a >process is only used for browsing contexts in background tabs and isn't >playing any audio or such, it can probably use less memory hungry >approaches. Correct - we need to have observers/what-have-you for background/foreground state (and we may want an intermediate state or two - foreground-but-not-focused (for example a visible window that isn't the focused window); recently-in-foreground (switching back and forth); background-for-longer-than-delta, etc. Modules can use these to drop caches, shut down unnecessary threads, change strategies, force GCs/CCs, etc. Some of this certainly already exists, but may need to be extended (and used a lot more). -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)
>I do hope that the 100 process figures scenario that was given is a worse case >scenario though... It's not. Worst case is a LOT worse. Shutting down threads/threadpools when not needed or off an idle timer is a Good thing. There may be some perf hit since it may mean starting a thread instead of just sending a message at times; this may require some tuning in specific cases, or leaving 1 thread or more running anyways. Stylo will be an interesting case here. We may need to trade first-load time against memory use by lazy-initing more things than now, though we did quite a bit on that already for reducing startup time. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Proposed W3C Charter: Web Performance Working Group
>Adding to what Tom said... > >1. "Web developers want the ability to observe the performance >characteristics of their applications" - they want to do so, but >*should* they be allowed to do so? The API would give access to deep >performance data that could be used for all sorts of nefarious purposes >(profiling, fingerprinting, probing for vulnerabilities, etc.). The extreme version of this is what Vlad and Benoit (Facebook) have proposed in WICG, which is an interface to profiling data for the page (origin): https://github.com/vdjeric/js-self-profiling Discussion (mozilla) here: https://github.com/mozilla/standards-positions/issues/92 One can understand why they'd *want* to be able to profile their code in-the-field. Exposing this today would be have serious same-origin and Spectre impacts; in a Fission world these problematic impacts would be (more) limited though perhaps not "safe". (Implied in the current Gecko Profiler impl is that other processes could affect how fast your origin's process runs, and thus how much progress is made between profiler 'ticks',leading to some leakage of information cross-orgin between processes. How large an impact this is I'm not 100% sure at this point. If we changed sampling to be runtime-based instead of wallclock-based, this (mostly) hides the impact of other processes, though secondary effects still exist (cache impacts, etc). Making it runtime-based would be a largish change... -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)
>On 7/11/18 5:42 AM, David Bruant wrote: >> I've seen this information of 100 content processes in a couple places but >> i haven't been able to find the rationale for it. How was the 100 number >> picked? > >I believe this is based on telemetry for number of distinct sites involved >in browsing sessions. As an example, 10 randomly chosen tabs in Chrome site isolation (a few months ago) yielded ~80 renderers (Content processes). Some sites generate a lot; that list of 10 included some which likely don't generate more than 1 or 2: google.com, mozilla.org, facebook login page, wikipedia (might spawn a few?). >> Would 90 prevent a release of project fission? > >It would make it harder to ship to users, yes... Whether it "prevents" >would depend on other considerations. It's a continuum - the more memory we use, the more OOMs, the worse we'll look (relative to Chrome), the larger impact on system perf, etc. There's likely no hard line, but there may be a defined "we need to get at least here" line, and for now that's 100 apparently (I wasn't directly involved in picking it, so I don't know how "hard" it is). We'll have to do more than just limit process sizes, but limiting process sizes is basically table stakes, IMO. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)
>Welcome to the first edition of the Fission MemShrink newsletter.[1] This is awesome and critical. I'll note (and many of you know this well) that in addition to getting rid of allocations (or making them lazy), another primary solution is to move data out of the Content processes, and into the master process (or some other shared process, if that's advisable for security or other reasons), and access the data over IPC. Or you can move it to a shared memory block (with appropriate locking if not static). For example, on linux one of our worst offenders is fontconfig; Chrome for example remotes much of that to the master process. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: open socket and read file inside Webrtc
>> On 05/07/2018 10:16, amantell...@gmail.com wrote: >> > I want to open a file inside webrtc core part. >> >> If you could extend the reasons on the why it would allow us to help you. >> >> What is your ultimate objective? why do you think you need to open a >> file inside webrtc core? >We are modifying the webrtc library to use another transport protocol (based >on ICN). >The problem is that in this preliminary phase we have configure the socket >using static parameters inside a file. >The new socket is implemented in a library that communicates via TCP with an >external daemon. >I want to use IPC, but I don’t have examples about this approach. If this is just for personal experimentation, you can locally disable the sandbox via about:config, and directly (locally) modify the media/mtransport code to use your transport protocol instead of UDP via IPC to the SocketTransportThread in the main process. Note that there's a lot of code that implements ICE to determine what ports are open, etc. That may complicate what you're doing. If you want to do more than personal/local experimentation, much more extensive discussions and work would be required. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Compile webrtc from source in firefox
>Il giorno giovedì 5 luglio 2018 10:17:32 UTC+2, amant...@gmail.com ha scritto: >> Hi, >> I don't understand which webrtc code is used by firefox builder. >> I need to modify the webrtc part and recompile it to be used by firefox. >> any help? >I tried to break webrtc to understand if it is compiled or not (in >particular media/webrtc/trunk/webrtc/pc/channel.cc), but i did not receive >errors. We don't use (or compile) all the imported webrtc code. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Windows Address Sanitizer enabled on trunk
>Bug 1360120 on inbound enables Windows ASan builds and tests on trunk branches. ... >This platform has taken several years to stand up. Thank you to >everyone who helped out, especially Ehsan for getting this started and >Ting-Yu for working through a bunch of hurdles on automation. > >Happy sanitizing! A late response, but this is truly awesome. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Removing tinderbox-builds from archive.mozilla.org
>On 5/11/18 7:06 PM, Gregory Szorc wrote: >> Artifact retention and expiration boils down to a >> trade-off between the cost of storage and the convenience of accessing >> something immediately (as opposed to waiting several dozen minutes to >> populate the cache). > >Just to be clear, when doing a bisect, one _can_ just deal with local >builds. But the point is that then it takes tens of minutes per build as >you point out. So a bisect task that might otherwise take 10-15 minutes >total (1 minute per downloaded build) ends up taking hours... Also (as others have pointed out) going too far back (often not that far) may run you into tool differences that break re-building old revs. Hopefully you don't get variable behavior, just a failure-to-build at some point. I'm not sure how much Rust has made this worse. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to ship: media-capabilities
>If my understanding is correct, Media Capabilities does expose quite a >larger fingerprinting surface in practice. While it may have been >theoretically possible for all trackers to gather statistics on video >playback for each configuration, the only scripts that could practically >carry out those attacks without degrading user experience would have been >video providers. This will be especially true if browsers start blocking >autoplay by default (https://bugzilla.mozilla.org/show_bug.cgi?id=1376321), >since users will never interact with media elements from fingerprinting >scripts. With the Media Capabilities API, it seems that a script like >fingerprintjs2 (https://github.com/Valve/fingerprintjs2) could run through >a big list of types/codecs and retrieve device information regarding >smoothness and energy efficiency with relatively little overhead? Probably so, yes. We could reduce but not eliminate the exposure by rate-limiting requests (perhaps even on a sliding scale, allowing a small number before delays are introduced). This is likely insufficient as a mitigation, however. >If autoplay is eventually blocked by default could we gate the response of >this API on user interaction with the media element? That might be possible, but if so it should be discussed in the spec and how to get "real" data after user interaction. (Perhaps giving fake data until user interaction, but then one needs to warn developers about this, and how to get real data when interaction occurs reliably.) -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Is super-review still a thing?
>On Thu, Apr 26, 2018, at 12:41 AM, Mark Côté wrote: >> How we might use blocking reviewers in our workflow is still open, but >> it could be used for training up reviewers, in which case the trainee >> would be a regular reviewer and the module peer/owner would be a >> blocking reviewer. > >It's not uncommon for me to submit patches for review from multiple people >where I require all reviewers to sign off on the patch, usually because I >ask them to look at different parts of the patch. I don't think I have >ever sent a patch to get review from more than one person with the >intention of landing it if only one person OKs it. (I'd probably needinfo >people to get that kind of feedback.) So ignoring potential new workflows >like the training one you mention, I think I would exclusively use blocking >reviewers. It's good to know that Phabricator will help me avoid >accidentally landing patches when not all of my reviewers have signed off. Agreed... but it sounds like they're not quite sure how this will be used. I'm concerned that developers may be confused and just as for review by N people, not realizing it can be landed when one of them r+'s. (Especially if the developer doesn't frequently use multiple reviewers.) At minimum, if this how it works, there will been to be clear communication about when to use this - or (better!) to switch the default to blocking, and have the unusual/more-work-to-ask-for case be any-of. Once in a blue moon I'll ask for a fast review from any of N people, then cancel the r?'s once i have 1 r+. but this is really rare, and mostly when there's a time deadline to land something ASAP (sec issue, or to get in ahead of a merge date). 99% of the time when I ask for N people, I need r+ from all of them. On the other side, I do know that the build peers (IIRC) us a shared reviewing setup, where most bugs are thrown in a pool for any of them to review. But that's not the normal workflow for any other group I know of in Mozilla, at least in Firefox. (I don't know them all, though.) -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to unship: URL.createObjectURL(MediaStream)
date=2018-04-18&keys=__none__!__none__!__none__&max_channel_version=beta%252F60&measure=USE_COUNTER2_DEPRECATED_URLCreateObjectURL_MediaStream_PAGE&min_channel_version=null&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2018-03-04&table=0&trim=1&use_submission_date=0 > >[2] >https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2018-03-01&keys=__none__!__none__!__none__&max_channel_version=beta%252F59&measure=USE_COUNTER2_DEPRECATED_URLCreateObjectURL_MediaStream_PAGE&min_channel_version=null&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2017-11-13&table=0&trim=1&use_submission_date=0 -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Phabricator and Bugzilla
>As I indicated, those posts go into detail on why we are avoiding both >comment and more complicated flag mirroring. > >Mark There's no obvious discussion of "flags" in the linked discussions you gave; I find only a couple of references to "flag" - in a question from gps. Given how long the thread is (52 posts), perhaps you can point to something more specific? Thanks >On Sat, Mar 31, 2018 at 10:14 AM, Ben Kelly wrote: > >> On Sat, Mar 31, 2018, 10:09 AM Mark Côté wrote: >> >>> Regarding comment and flag mirroring, we've discussed this before: >>> >>> https://groups.google.com/d/msg/mozilla.dev.platform/ >>> Y8kInYxo8UU/e3Pi-_FpBgAJ >>> https://groups.google.com/d/msg/mozilla.dev.platform/ >>> Y8kInYxo8UU/tsF7UfxvBgAJ >>> >>> Given that Phabricator is still new, I don't see any reason to reopen >>> that discussion at this point, aside from noting that we have work in >>> progress to include Phabricator requests in BMO's My Dashboard and >>> notifications indicator (https://bugzilla.mozilla.org/ >>> show_bug.cgi?id=1440828). >>> >> >> What about comment mirroring? On my mobile so I haven't read all the past >> threads, but my recollection is that your team did not want to implement >> that feature. Personally, this is a huge concern for me. >> >> Thanks. >> >> Ben >> >> >>> As for interdiffs, feel free to file a bug with any problems you see. We >>> have a good relationship with upstream and can pass those on. Similarly >>> with method names (which has been mentioned before but I can't find where >>> at the moment). >>> >>> There is official documentation at https://secure.phabricator. >>> com/book/phabricator/ which is linked from our Mozilla-specific docs ( >>> http://moz-conduit.readthedocs.io/en/latest/phabricator-user.html) which >>> in turn is linked in the left-hand menu in Phabricator. We can expand our >>> own docs as needed if there are areas that are particularly confusing due >>> to, say, expectations carried over from our other code-review tools. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: testing event targets for memory leaks
>Hi all, > >I recently landed some test infrastructure for testing event targets for >memory leaks. This was part of fixing my service worker memory leak in bug >1447871. I wanted to let people know this existed and also ask for help >writing tests for more event targets. > >To repeat, I need help writing more tests. Please see the list of untested >APIs at the bottom of this email. >There are many, many event targets in the system, though. Using searchfox >to look for subclasses of DOMEventTargetHelper I came up with a list of >event targets that are not currently tested. Some of these are deprecated >or not exposed. But a lot of these are web exposed. It would be really >great to get this kind of simple leak test written for these APIs. I'm surprises that DOMDataChannel wasn't found: nsDOMDataChannel.h: class nsDOMDataChannel final : public mozilla::DOMEventTargetHelper, perhaps you were looking for "public DOMEventTargetHelper"? I also find nsScreen and nsDOMOfflineResourceList using mozilla::DOMEventTargetHelper Are there any events implemented in JS that we need to worry about? For example, there are a lot of events (and a number of objects) defined for WebRTC as part of dom/media/PeerConnection.js -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Faster gecko builds with IceCC on Mac and Linux
>On 1/16/18 2:59 PM, smaug wrote: >Would it be possible that when I do an hg pull of mozilla-central or >mozilla-inbound, I can also choose to download the object files from the >most recent ancestor that had an automation build? (It could be a separate >command, or ./mach pull.) They would go into a local ccache (or probably >sccache?) directory. The files would need to be atomically updated with >respect to my own builds, so I could race my build against the >download. And preferably the download would go roughly in the reverse order >as my own build, so they would meet in the middle at some point, after >which only the modified files would need to be compiled. It might require >splitting debug info out of the object files for this to be practical, >where the debug info could be downloaded asynchronously in the background >after the main build is complete. Stolen from a document on Workflow Efficiencies I worked on: Some type of aggressive pull-and-rebuild in the background may help by providing a ‘hot’ objdir that can be switched to in place of the normal “hg pull -u; ./mach build” sequence. Users would need to deal with reloading editor buffers after switching, but that’s normal after a pull. If the path changes it might require more magic; Emacs could deal with that easily with an elisp macro; not sure about other editors people use. Keeping paths to source the same after a pull is a win, though. Opportunistic rebuilds as you edit source might help, but the win is much smaller and would be more work. Still worth looking at, especially if you happen to touch something central. We'd need to be careful how it interacts with things like hg pull, witching branches, etc (defer starting builds slightly until source has been unchanged for N seconds?) I talked a fair bit about this with ted and others. The main trick here would be in dealing with cache directories, and with sccache we could make it support a form of hierarchy for caches (local and remote), so you could leverage either local rebuilds-in-background (triggered by automatic pulls on repo updates), or remote build resources (such as from the m-c build machines). Note that *any* remote-cache utilization depends on a fixed (or at least identical-and-checked) configuration *and* compiler and system includes. The easiest way to acheive this might be to leverage a local VM instance of taskcluster, since system includes vary machine-to-machine, even for the same OS version. (Perhaps this is less of an issue on Mac or Windows...). This requirement greatly complicates things (and requires building a "standard" config, which many do not). Leveraging local background builds would be much easier in many ways, though also less of a win. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Hiding 'new' statements - Good or Evil?
[SNIP] >> If foo->bar() can mutate the lifetime of foo, of course you must take a >> strong >> reference to foo. Nothing about auto or range-for changes this. > >What auto does is make it really hard to tell whether a strong reference is >being taken. > >> If you don't understand your lifetimes, you get bugs. > >Fully agreed. The discussion is about whether auto helps or hinders that >understanding. The answer is that it depends on the surrounding code, from >where I sit... > >-Boris So, where do we go from here? It's clear that auto is not as safe as some/many believe it to be; it can as Boris (and smaug and Xidorn) say hide lifetime issues and lead to non-obvious UAFs and the like. Some uses are certainly safe, but it isn't always obvious looking at a patch (per above), requiring more code investigation by the reviewer. If the resultant type isn't obvious, then using it isn't helping comprehension. When reading the code in the future, if the reader has to do non-trivial investigation just to know what's going on, it's hurting our work. If the win is to avoid typing I'd say it's not worth the risk. Or allow it, but only with commentary as to why it's safe, or with rules about what sort of types it's allowed with and anything other than those requires justification in comments/etc. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Bigger hard drives wanted (was Re: More ThinkStation P710 Nvidia tips (was Re: Faster gecko builds with IceCC on Mac and Linux))
>On 11/7/17 4:13 PM, Sophana "Soap" Aik wrote: >For the work I do (e.g. backporting security fixes every so often) I need a >release tree, a beta tree, and ESR tree, and at least 3 tip trees. That's >at least 150GB. If I want to have an effective ccache, that's about >20-30GB (recall that each objdir is 9+GB!). Call it 175GB. > >If I want to dual-boot or have a VM so I can do both Linux and Windows >work, that's 350GB. Plus the actual operating systems involved. Plus any >data files that might be being generated as part of work, etc. I've "solved" this by having a 2T rotating disk for the stuff I don't use constantly - release and ESR trees, local backups, if need be I'll move other large things there (media files, RR storage which is currently in ~/.rr) I have 4 inbound trees (one dedicated to ASAN) and head/beta trees, plus a couple of "mothball" trees for reference from old instances of alder (those could be moved, though I trust rotating disks far less than SSD. That said, I have had a (personal/retail) SSD die.) Right now on my ~350GB Linux /home partition (there's a windows one too, though I rarely use it) I have ~220GB used. (there's also a 50GB / partition). src/mozilla is 120GB (including objdirs, though I kill them fairly aggressively if they're out-of-date). I should move my final aurora repo to rotating disk.. I probably am not giving anywhere near enough space to ccache, though. Rotating disks are cheap (and easy if you have a desktop; less so though not horrible if you have a a laptop, especially with a dock). They don't necessarily solve Boris's problem, however. He could really use a 1TB SSD I suspect. When I got my current laptop, I asked for some options I saw on Lenovo's site that weren't the default config. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Website memory leaks
>On Thu, Nov 02, 2017 at 05:37:30PM +0200, smaug wrote: >>This has been an issue forever, and there aren't really good tools on any >>browser, as far as >>I know, for web devs to debug their leaks. >>Internally we do have useful data (CC and GC graphs and such), but would need >>quite some ux skills to design some good >>UI to deal with leaks. Also, the data to deal with is often massive, so the >>tool should be implemented that in mind. > >We do have memory tools in devtools now, and the dominator tree in >particular can provide some useful data for tracking down leaks. But those >tools aren't especially well-maintained at the moment, and I honestly think >they're too flaky at this point to recommend to webdevs for general use :( > >It would be nice if we could prioritize them again. Also, tools for developing a page are one thing, but usage in the field is another. Ability to know in a loaded page about memory use (and to know about usage of an embedded iframe) would give web devs information and a way to put their own pressure (and maybe limits) on. Plus, most devs unless they're dealing with a report about a problem won't go looking. Something that proactively can poke them is far more likely to get action. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Website memory leaks
>Many of the pages causing these leaks are major sites, like nytimes.com, >washington post, cnn, arstechnica, Atlantic, New Yorker, etc. ... >Perhaps we can also push to limit memory use (CPU use??) in embedded >ads/restricted-iframes/etc, so sites can stop ads from destroying the >website performance for users over time. I've often seen ads taking >300MB-2GB. So in support of this general concept of limiting ad memory use: https://www.scientificamerican.com/article/should-iconic-lake-powell-be-drained/ doesn't leak if you load it -- until you scroll down, and the ads load. Then it leaks forever... to the tune of 1GB leaked in 5-10 minutes. Differential about:memory reports show that what's primarily leaking are ads, in particular at this moment: 400.25 MB (43.45%) -- detached 398.81 MB (43.30%) -- window(https://tpc.googlesyndication.com/safeframe/1-0-13/html/container.html) seems like the worst culprit, plus 406.69 MB (44.15%) -- top(https://www.scientificamerican.com/article/should-iconic-lake-powell-be-drained/,id=NNN) 327.79 MB (35.59%) -- js-zone(0xNNN) Other ads have leaked a few MB to 75MB each. Also, as soon as I scrolled down CPU use went from ~0% for the process to ~20% (on a 4-thread/core AMD CPU). Worse yet (perhaps bug 1410381??) when I hit reload CPU use dropped (not to 0). 10 sec later, Mem use climbed 3.7GB to 4.5GB. then dropped to 3.7GB and climbed back to 4.5GB. Then dropped to 2.7GB - barely above where it was before I scrolled down - and stayed stable. (Note: I hit reload with the current position near the bottom with 1 ad visible (no video). Using an additional GB+ of memory in order to free 1GB of memory seems... excessive. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Website memory leaks
>about:performance can provide the tab/pid mapping, with some performance >indexes. >This might help solve your issue listed in the side note. mconley told me in IRC that today's nightly has brought back the PID in the tooltip (in Nightly only); it was accidentally removed. about:performance can be useful, but 3 tabs vs all tabs is too coarse for me, and things like "site leaked memory and is slowing CC" I presume doesn't show up in the 'heat' for the tab there. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Website memory leaks
moving (more) security-sensitive and/or crashy services into separate processes -- all of which come into play here. Side-note: removing the process-id from tabs in mult-e10s has made it harder for me to hunt down offending tabs, since I track down runaway memory/CPU in things like Process Explorer. (Usually after I notice that the browser has gotten slow/non-responsive) Totally reasonable, though I'd like to see the ID back on Nightly at least - makes starting GDB against the right content process *much* easier, and I don't have to limit the browser to 1 to avoid the problem. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Visual Studio 2017 coming soon
>2017-10-30 19:19 GMT+01:00 Kris Maglione : > >> Our static analysis tools are pretty good at catching a lot of lambda >> capture bugs at this point, though. I'd be much less comfortable using them >> if they weren't. >> >> It's probably worth considering whether we need to write static analysis >> tools for a new feature before we turn it on, though... > >We can probably help with introducing more static analysis to avoid >incorrect usages of C++{11,14,17} features. Sure - but in most cases we've only realized that a static-analysis bit was needed after hitting and solving a few (or a bunch of) crashes/sec bugs -- static-analysis tends to be reactive. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Default Rust optimization level decreased from 2 to 1
>On 2017-10-25 1:34 PM, Gregory Szorc wrote: >> Adding --enable-release to your mozconfig (the configuration for builds we >> ship to users) enables -Copt-level=2. (i.e. we didn't change optimization >> settings for builds we ship to users.) > >I've added a note about this to our benchmarking instructions at >https://developer.mozilla.org/en-US/docs/Mozilla/Benchmarking#Rust_optimization_level Note that this means that profiles taken with local builds will be "off" compared to Nightly/etc, unless you rebuild with extra options. How much off? probably not a lot unless you're super-focused on Rust code, but that's just a guess. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to ship: New default action, horizontal scroll, of wheel with Shift key (except Firefox for macOS)
>SGTM. BTW, bug 143038 was filed 16 years ago. Is that a bugzilla record for >oldest fixed bug? Not even close I think... a couple of years ago I remember some low-4-digit bugs getting fixed (maybe even a 3-digit?) :-) -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Changes to tab min-width
>> I find (perhaps because I'm a mozilla user, and a tab-hoarder) that >> Chrome's UI is DREADFUL for anyone with lots of tabs - and probably >> intentionally to push users into closing them, since large numbers of >> tabs simply doesn't work as well in Chrome as in Firefox. So mimicing >> their behavior is NOT a good idea, IMHO. > >I'm one of the people (a minority in this thread, it appears) who prefers >Chrome's behavior and who'd like a skinny-tab option in Firefox, though not >necessarily the default option. I use all the major browsers, typically >with a dozen or more tabs open in the browser I'm using for my primary work >at any given time. I vastly prefer Chrome's tab behavior for several >reasons and rely on it as my primary browser in part because of it. Sounds like a good option for an extension. >I believe the complaint of "XYZ pixels is too narrow because it hides >necessary information in the tab" misses another point. For me, having tabs >vanish off one side or the other of the tab strip is a worse omission of >important information. I have enough tabs that there's No Way they can even be visible as close boxes (and it's useless to me when it gets down to favicon). And I get that you have a different set of behaviors/preferences. An un-discoverable feature of the Awesome Bar is "%url-fragment/title-fragment". It will show you completions from the list of open tabs only. This (and using scroll-wheels to spin quickly through a overflowing tab bar) make large numbers of tabs feasible without going the huge number of windows route. I think we could do skinnier tabs if we retained the ability to see what they are (not just favicon) easily. The more I think of it, the more I like the dock-like expand-what's-near-the-mouse idea - I wonder how easy it is? >I have typically navigate my 20 or 30 or 40 tabs mostly by keyboard, >cycling one way or the other across the tab strip, and for me the spatial >arrangement is very important (as is tab-switching speed). I find spatial organization is also quite useful, but I orient myself to it visually. I never use the keyboard nav - don't even know what the bindings are. :-) >There is an argument to be made for making life easier for people moving to >Firefox from Chrome, which clearly is an ambition in the current Firefox 57 >Quantum push. I don't have any data about how widespread my preferences are >or how much of a barrier it is adjusting to Firefox's disappearing tabs, >but this heavy Chrome user prefers Chrome's approach. > >I'm not trying to argue that my preferences are universal. But I expect >Google made its choices pretty carefully and not as a way to punish people >for using too many tabs. I use lots more tabs than the average user, but I >expect the general trend is drifting toward more and more tabs, so graceful >handling of overflow will become important for a larger fraction of people >as time goes on. Having a option for tab-handling might be good; it's a primary way users interact with the browser -- and *no* single way is the Right Way for all users. Extensions can help here, modulo most users don't look for them. You could also offer that (extension or option) when they import profile data from Chrome, or put a link in Prefs to a Chrome-like tab WebExtension (Prefs *could* have links to relevant Extensions). -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Changes to tab min-width
>On Fri, Oct 6, 2017 at 12:57 AM, Lars Hansen wrote: > >> even if I don't exactly remember the ID I'm looking for I can narrow >> it down to one or two tabs and then hover if I need to. >> >> Many other sites also have tabs that can be distinguished >> from the first few letters - if you can see them. (Too many do not.) >> Indeed now that I have this pref to play with I have set it to 200 and this >> seems even better. > >I certainly feel your pain wrt bugzilla tabs, but outside our community's >unique intense use-case it's not that common amongst release users (see >stats given earlier about high percentages with < 10 and < 20 tabs). I don't know about you, but a common case for me is go-to-news-site, open-N-articles-in-tabs, read-articles (maybe ;-) ). Probably learned that in the days of less bandwidth; stuff can pull down in the background. Saves a lot of go-back, wait-for-page-to-load/render/scroll/etc. For that usage, I need at least a word or so of the title - N tabs together with the same favicon. Current width mostly works for this. I find (perhaps because I'm a mozilla user, and a tab-hoarder) that Chrome's UI is DREADFUL for anyone with lots of tabs - and probably intentionally to push users into closing them, since large numbers of tabs simply doesn't work as well in Chrome as in Firefox. So mimicing their behavior is NOT a good idea, IMHO. I totally hate the favicon-only view I'm seeing now. it's miserable. If I were a user, and didn't know about the hidden pref, I'd downgrade, and if it wasn't fixed soon I'd look for another browser (or some extension). That's how bad it is. That said: there are other solutions possible. There's the infamous "Add something to user Preferences". There's "publish an extension that lets you fiddle the width" (doable today). You could put it on the right-mouse-button on the bar. etc. Or... you could do some sort of apple-dock-like "tabs (in the bar) expand as you move the mouse over/near them", which allows much better "Find the tab I want" than narrow tabs, and even better than 110px/etc because you can see more/all the title without having to wait for a hover to take hold (and you could see more of the ones near the mouse, probably. Of course, that's a lot easier said than done. :-) >Better still would be making vertical tabs a selectable option in >about:preferences as a common (and unique to Firefox[1]) power-user option >rather than making us find and install an add-on. Yes! >> I feel that the many-tabs use case is treated as a stepchild in >> Firefox. >> >> Notably the drag-to-reorganize scheme does not work well for many >> tabs; a tab pane a la what tabgroups had would be superior. > >Completely agree. Even if we don't support "groups" of hidden tabs, the >tableau visualization was great for managing overstuffed windows and >quickly culling the tabs spawned during some now-complete task or research. yes! I actually often cull tabs from session-restore (vertical list with titles!) or from about:tabs (Tab-stats extension, now broken -- Glandium!?) -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Follow up on clang-format
>On Thursday, September 28, 2017 at 10:15:05 AM UTC+2, Chris Pearce wrote: >Oh d'oh! Looks like I replied to an old thread, and the plan now is in fact >to clang-format the entire tree after 57. Sweet as! Where did you find that? Was this plan communicated? It's not in this thread... (which started with smaug replying to a 2014 post by Anthony). Personally, this will mostly make me sad/annoyed and make figuring out why code was landed a pain (can be a real pain for people who do a lot of reviews), but I'll deal with it. I'm more concerned that this wasn't communicated (or discussed!) well, since 58 is already open... -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Coding style: Argument alignment
>On 8/31/17 6:50 PM, Ehsan Akhgari wrote: >> On 08/30/2017 09:03 PM, Eric Rahm wrote: >>> Okay sounds like there's agreement on a). >> >> Indeed. I have edited the Style Guide to reflect this. > >Fwiw, I disagree with this decision. > >Here's a more realistic example of *actual use* from our code base: >http://searchfox.org/mozilla-central/rev/d29c832536839b60a9ee5b512eeb934274b879b0/layout/forms/nsFormControlFrame.h#57-64 > >This example shows that it's easier to read arguments at a glance >when they are aligned, IMO. I agree with Mats here - it's easier to read, specifically in the case where you have a large number of args, especially of widely varying type-lengths. Obviously, one could disagree, but that's a good example. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: disabled non-e10s tests on trunk
>you could edit the taskcluster/ci/test/tests.yml file and add >|linux64/debug: both| in the places we have |windows7-32/debug: both|, as >seen here: >http://searchfox.org/mozilla-central/source/taskcluster/ci/test/tests.yml#138 > >If there are concerns with windows build times, please ask for those to be >faster- it will make it better for all of us :) Where's my magic wand...? if it was easy to make them faster I presume that would have been done long ago. I frequently do (only) linux Trys unless I suspect platform-specific issues. Saves resources... Can't "mochitest-media" (and other non-e10s versions) be not-run by default, unless you specify it explicitly in -u ? -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Phabricator and confidential reviews
>> Bite the bullet and at least make all CC'd people able to see all >> patches, always. It's needed. > >Yeah, that's the direction I think we should take. Good, thanks. >For now, we will implement exact syncing of the CC list + reporter as the >revision's subscriber list. This means that anyone being added as a >subscriber who isn't CCed will unfortunately be quickly removed (we may >even prevent manual additions to the subscriber list). The reasoning here >is that I believe it's important that anyone looking at the bug knows >exactly who can see the patch. If the two lists are kept in sync, it will >be obvious. However, if we let people add subscribers directly, then it >may turn out that a bug with only, say, 10 CCed people might have a patch >that 100 people can view, and that's probably something you don't want to >hide. > >We'll also investigate what bidirectional syncing would take. Due to race >conditions and circular references, though, it won't be straightforward, >and I suspect our time would be best spent elsewhere. Don't bother with bidirectional syncing - your point about being able to look at the bug and know who can see it is important, but I don't see any advantage in subscribe mirroring into cc. I would (as you suggest) block subscribe on sec issues, to avoid confusion (if you can, tell people to use cc, but that's just gravy). -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: ./mach try fuzzy: A Try Syntax Alternative
>+to...@lists.mozilla.org > >There have been a bunch of new features added here I'd like to highlight: > - --env: You can now set environment variables in your tasks directly > from the command line, e.g: > - ./mach try fuzzy --env XPCOM_DEBUG_BREAK=stack --env > MOZ_LOG="example_logger:3"| This is *awesome*; I've been wanting this for YEARS. Does this work without 'fuzzy'? > - --save/--preset: Works the same as try syntax, using the --query > argument. E.g: > - ./mach try fuzzy --save reftest -q "!cov !pgo 'reftest !jsreftest" > - ./mach try --preset reftest Also really great. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Quantum Flow Engineering Newsletter #22
> - We found out that MediaStreamGraphStableStateRunnable is #20 on this > list, which was surprising as this runnable is only supposed to be used for > WebRTC and WebAudio, neither being extremely popular features on the Web. > Randell Jesup found out that there is a bug > <https://bugzilla.mozilla.org/show_bug.cgi?id=1395012> causing the > runnable to be continually dispatched after a WebRTC or WebAudio session is > over. This may have already been fixed; after updating and restarting I've been unable to reproduce, and padenot indicates he fixed a shutdown bug in MSG in the last week or so before I tested.. Also, further analysis by billm showed that only 10% of sessions had MSGStableStateRunnables, but in those 10% it was the most frequent runnable. This may be as-expected, since when in use the runnables are used to update MainThread media state 50-100 times per second. We hope to at some point move (most of) these runnables to atomics for updating MainThread, and we'll look to see if we can throttle the events/s from MediaStreamGraph. Note that monitoring entries in about:telemetry can be handy for tracking down issues. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Phabricator and confidential reviews
>On Wed, Aug 9, 2017 at 11:32 AM, Mark Côté wrote: > >> I actually like Gijs's proposal, to mirror *from* Phabricator *to* BMO. >> That way, if you're looking at the bug and want to pull someone in, you CC >> them; if you're looking at the fix and want to involve someone, you add >> them as a subscriber which then incidentally lets them view the bug, for >> background and updates and such. > > >What if there's not "the" fix but multiple patches? This is quite common >for security bugs where a testcase that reveals the vulnerability is done >in a separate patch so it can be checked in after we release the fix. Or >occasionally bugs like https://bugzilla.mozilla.org/show_bug.cgi?id=1371259 >that had 9 separate patches. Would the same people have to be separately >subscribed to each? I've seen landings (for non-security bugs) that involved up to 100 patches on one bug... or even more. While that's probably never happened for a sec bug, multiple patches is not unusual (and in fact common where there's a test added). >And don't forget reporter and assignees. Occasionally a reporter not in the >security group will notice that a patch is insufficient which is nicer to >find before the patch is committed than after the commit link is added to >the bug. Sometimes someone other than the official assignee adds a patch >for an alternate approach to a fix and asks the assignee for feedback, >though that's uncommon and the assignee could just be manually subscribed >to the patch at that point. In fact it's quite common for people who are cc'd other than the patch writer and reviewer(s) to look at the bug and comment on it, or to submit an alternate or modified version of a patch previously uploaded. And as Dan points out, frequently the reporter (when an external 'researcher') will weigh in on the patches being posted or use them to verify if they fix the problem they found (since many times only they can reliably reproduce the problem - private test setups, drivers, HW, etc). >We can wait and see how common the "please subscribe me to the patch" >requests are, but I suspect they'll be common enough. Bite the bullet and at least make all CC'd people able to see all patches, always. It's needed. Another aspect of all this that needs to be thought out and verified is what happens when a non-sec bug becomes a sec bug (and vice-versa, though I'm far less concerned about that). When a bug becomes a sec bug, all patches associated with that bug must become confidential. And when a bug moves to be open (not sec-restricted), the patches should also become open. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Phabricator Update, July 2017
>On 2017-07-14 1:31 AM, Jim Blandy wrote: >> Many people seem to be asking, essentially: What will happen to old bugs? >> I'm trying to follow the discussion, and I'm not clear on this myself. >> >> For example, "Splinter will be turned off." For commenting and reviewing, >> okay, understood. What about viewing patches on old bugs? > >Migration-specific issues are still in flux, since we have still have >plenty of time to sort them out. I'll be clarifying the migration plan as >we get closer to turning off old tools. So far we've agreed that keeping >Splinter in read-only mode to view old patches is totally fine, and Dylan >(BMO lead) mentioned that there are even nicer diff viewers that we can >integrate with BMO, like https://diff2html.xyz/, which is currently in use >by Red Hat's Bugzilla installation. Good! your recent posting made me believe you'd gone back to "splinter must be totally turned off". Thanks. I still have significant questions about how security-bug access and review (and security of such items) will work; is there a design? What has changed since our discussion in May, and have you met with the security engineers and major security reviewers/triagers? Thanks -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Phabricator Update, July 2017
>On 7/13/17 9:04 PM, Mark Côté wrote: >> It is also what newer systems >> do today (e.g. GitHub and the full Phabricator suite) > >I should note that with GitHub what this means is that you get discussion >on the PR that should have gone in the issue, with the result that people >following the issue don't see half the relevant discussion. In particular, >it's common to go off from "reviewing this line of code" to "raising >questions about what the desired behavior is", which is squarely back in >issue-land, not code-review land. Yes, exactly. Once the data is in two places, there's no way to avoid some discussions occuring in the "wrong" place. And Github shows it clearly happens all the time, so you still have to flip back and forth between looking at N review-streams of comments, and the bug, and maybe looking at the dates of comments, in order to understand everything. >Unfortunately, I don't know how to solve that problem without designating a >"central point where all discussion happens" and ensuring that anything >outside it is mirrored there... Ditto - I think it's inherent if you can reply to review comments in the separate tool. Making the review-tool *worse* for commenting (features/ease-wise) can actually kinda help, at the cost of hurting responding to review comments. Googe's Issue via codereview tooling kinda takes this approach, from what I've seen, with *very* non-integrated patches vs bugs. GPS's comments about the pain of syncing also makes total sense, in that MozReview tried to mostly attach to existing API points of bugzilla and do things noone anticipated would be done. I don't think limiting comments in phabricator is a viable option, nor making it less-featured, so we may have to eat the pain and lash people with wet noodles if they post in the "wrong" place. -- Randell Jesup, Mozilla Corp remove ".news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Phabricator Update, July 2017
>On Wed, Jul 12, 2017 at 11:27 AM, Byron Jones wrote: > >> But indeed having also the patches in bugzilla would be good. >>> >> no, it would be bad for patches to be duplicated into bugzilla. we're >> moving from bugzilla/mozreview to phabricator for code review, duplicating >> phabricate reviews back into the old systems seems like a step backwards >> (and i'm not even sure what the value is of doing so). > >I find this a strange argument. We don't know how successful phrabricator >is going to be. The last attempt to switch review tools seems to be >getting killed. I think its somewhat reasonable to be skeptical of further >review tool changes. I quite agree... And I hate the idea of having the data spread across systems (which I also disliked in MozReview, which I tried and it ended up being really bad for some aspects of my ability to land patches). >Also, I have to say the whole phabricator thing has been kind of presented >as a fait accompli. As someone who continues to use splinter on a daily >basis I'm 1) skeptical and 2) kind of unhappy. > >Maybe phabricator will be great, but I'll believe it when I see it. Please >don't force future data and workflow into an as-yet unproven tool. This while sequence strikes me as "Let's launch our new boat, and oh by the way we're going to sink the old boat(s) so no one is tempted to not move to the new boat. And, BTW, we're leaving behind some important things, and we'll think about how we can deal with those sometime, maybe before we launch, maybe later, maybe never." While this may force people onto the new boat, I'm seriously unconvinced this will be better in the short term, or that it won't end up slowing down N*100 developers, either for a while or permanently. And the lack of transparency in developing this plan is also very concerning. IMO -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Phabricator Update, July 2017
>>> To answer the other part of your question, MozReview will be disabled for >>> active use across the board, but it is currently used by a small number of >>> projects. Splinter will be disabled on a per-product basis, as there may be >>> some projects that can't, won't, or shouldn't be migrated to Phabricator. >> >> Splinter is still a nice UI to look at patches already attached to bugs. >> Please don't disable it. > >excellent point; thanks for that feedback. > >instead of disabling splinter for phabricator backed products, we could >make it a read-only patch viewer. I would consider it a massive fail if we disabled at least read-only splinter viewing of patches. As noted before. let's make sure we don't lose things we agreed on as issues. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to unship: moz*Frames attributes of HTMLVideoElement
>On Monday, July 10, 2017 at 3:28:07 PM UTC+12, Randell Jesup wrote: >I added these stats originally, and they are now mostly superseded by the >stats provided by VideoPlaybackQuality. So I support their removal (in fact >I suggested to Tim that he remove them). > >Adding telemetry to learn how often these stats are used in the wild seems >like a good idea. I'd be surprised if these are still used, but I'm happy >to be proved wrong! I know various webrtc app developers have asked about how to determine if video elements are showing frames, which they've used as "is video still flowing" indicators, among other things. We can try to find alternative ways to get them what they need; our tests for example use it to know when to sample the image to see if media is successfully flowing end-to-end, and to know when to sample with a canvas to see if the "right" color is there (IIRC) >Randall: doesn't VideoPlaybackQuality support the use cases you're concerned >about? If VideoPlaybackQuality is supported for srcObject streams, and totalVideoFrames works (frames as received from the MediaStream by the video element), then it can substitute. The other fields in the VideoPlaybackQuality object are likely much harder to hook up (or define) for WebRTC or MediaStream/srcObject inputs in general - it would imply additional streams of metadata flowing independently of the actual frames (i.e. droppedFrames would need to be updated even though no frame was sent down the stream from the PeerConnection.) And there are other sources of MediaStreams, in particular mediaelement.captureStream() and canvas.captureStream(); someone needs to define how those interact with srcObject and VideoPlaybackQuality (and fun cases like PeerConnection -> MediaStream -> MediaElement -> canvas (webGL, etc) -> captureStream -> MediaElement) I presume no one working on VideoPlaybackQuality was thinking about non-streaming sources for video when they defined it. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Switching to per-channel profiles
>What happens when users do that? Because they do. > >A variety of kinda-horrible things will happen. > >The two copied profiles will compete for the Sync client record. That means >sent tabs will appear on one or the other, the Tabs from Other Devices list >will flip-flop between each of the two devices, the count of devices will >be wrong for scheduling purposes, and the two devices will unpredictably >handle bookmark repair requests, leading to distributed inconsistency. Ouch. Glad when I copied profiles I didn't (and still don't) use Sync; having this happen would be ... surprising - and painful, sounds like. I wonder if it's possible to flag moved (or copied) profiles, and force them to go through some sort of re-sync (effectively become a "new" client for the account, even if it was only moved to a new disk location). That *should* in theory avoid the consistency problem, and only annoy users without reason in the odd case where the moved their profile(s). -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to unship: moz*Frames attributes of HTMLVideoElement
>On Fri, Jul 7, 2017 at 11:54 AM, Jet Villegas wrote: > >> What do we expect to break? > > >I can see that video quality auto adjusment which is based on these APIs >will become malfunction. But, I don't know is this a real use case that a >website implement video quality adjusment based on these APIs. This will break a number of our internal mochitests. I'm not sure without some checking if the standardized properties can replace them (I think the answer is no in some cases). We could put them behind a pref for testing use. > >> Who's out there using these APIs now? >> > I do believe a number of webrtc services may use these; it's hard to be sure because many of them are not easily findable by searching github or search engines. >There are some addons using these APIs to report media statistics. For our >internal use, there are only some test code using them. And I think the >VideoPlaybackQuality can be used to replace them since it provides a >similar feature. In webrtc and related use, quality is not what these measure; we want to know if frames have arrived or been painted (generally using mozPaintedFrames). There is one test using mozParsedFrames. Before landing any deprecation warnings, you should check with the media/webrtc teams, and we may want to check how one or two external users are using it. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: switch to macosx cross-compiled builds on taskcluster on trunk
>At 11:00PT today, we will be landing patches to run mac opt builds on trunk >as cross compiled builds on Linux machines on taskcluster. As this change >is uplifted to m-c, nightly builds for mac will also switch to run on >taskcluster on Linux. We will be testing to ensure that updates work as >expected. We don’t expect any impact to developers as this has been tested >extensively on a project branch as well as by manual testing by PI. Does this have affect on our still using the 10.7 Mac SDK? I've had to add a number of hacks to imported code to make it buildable on 10.7 SDK, and we can't use certain newer features/APIs because of it. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: BaseThreadInitThunk potential collateral benefits
Liz wrote: >I'm so happy for this hammer to slam down. Goodbye, at least a good bit of >horrible DLL injections and crashes, and horrible BaseThreadInitThunk. >Thanks for the explanation. The suggestions for next actions sound good >too. Hear hear! And thanks to Carl (and David Durst, and JoeH) for jumping on this when Liz and I (and others) from the regression triage group put the screws to Joe Hildebrand (I threatened to withhold desert!) to prioritize it. >On Mon, Jun 5, 2017 at 2:36 PM, David Durst wrote: >> I wanted to call attention to https://bugzilla.mozilla.org/ >> show_bug.cgi?id=1322554 (Startup crashes in BaseThreadInitThunk since >> 2016-12-03) -- a startup crash resulting from DLL injection early in the >> process -- which was resolved fixed for 32bit Windows on 5/24. >> >> This particular fix[0] has the potential side effect of solving other bugs >> as well, ones that may have very different signatures that deal with >> remotely-injected code[1]. >> >> Just a caution that it's not going to impact all injection issues. We know >> that there is an as-yet-unidentified increase and decrease in this crash >> prior to patch landing[2]. Carl Corcoran, who landed this, will be looking >> at that time period as well to attempt to find contributing causes (any >> help diagnosing is probably welcome, though I defer to ccorcoran). >> >> Also, just a shout out that this was Carl's first patch. >> >> Thanks! >> >> >> [0] The proposed/chosen solution is roughly here: >> https://bugzilla.mozilla.org/show_bug.cgi?id=1322554#c69 >> [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1322554#c168 >> [2] https://crash-stats.mozilla.com/signature/?product= >> Firefox&release_channel=release&signature=BaseThreadInitThunk&date=%3E% >> 3D2016-12-05T14%3A42%3A31.000Z&date=%3C2017-06-05T14%3A42%3A31.000Z#graphs -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: The future of commit access policy for core Firefox
e all the >time. So you have to pick a reasonable default then debate about adding >complexity to placate the long tail or handle corner cases. Standard >product design challenges. On top of that, technical users are the 1% and >are a very difficult crowd to please. I've dealt with these by handling rebases myself (i.e. reviewers don't get involved), with the rare exception that a rebase requires non-trivial rewrite, in which case I just ask for re-review or re-review on that part, or write the rebase as a new patch on top of the reject. This is rarely needed, however. Interdiffs for responding to review comments - sometimes (trivial patches) I just replace and r? the entire patch. More complex ones I'll set up as a separate patch (interdiff) and ask them to review that, then fold/merge before checkin (yes, I use mq and splinter still - but the general idea would apply regardless). And yes, mozreview does a MUCH better (and more automatic) job of dealing with interdiffs automatically. But that exposes you to the rebase problem. As you say, there isn't an easy answer. Or just "never rebase until the patch is "done", then deal with rebasing all at the end. (And, of course, for some long-in-review patch queues, this won't be possible.) >I'm sensitive to concerns that removal of "r+ with nits" will provide a net >productivity regression. We should be thinking of ways to make landing code >easier, not harder. This is a larger discussion spanning several topics, of >course. But the point I want to make is we have major, systemic problems >with code review latency today. Doing away with "r+ with nits" exacerbates >them, which is part of the reason I think people feel so passionately about >it. I feel like there needs to be a major cultural shift that emphasizes >low review latency, especially for new and high volume contributors. Tools >can help some. But tools only encourage behavior, not enforce it. I feel >like some kind of forcing function (either social/cultural or formal in >policy) is needed. What exactly, I'm not sure. At this point (where I see >very senior engineers with massive review queues), I fear the problem may >have to be addressed by management to adequately correct. (Yes, I hated >having to type the past few sentences.) We have greatly improved review latency in the last year or two (an extreme case was certain people left reviews active for 9 months or a year, because they disagreed with the patch direction). It's much better now, but it isn't perfect, and trying to drive it a lot lower can have negative impacts that may be hard to measure. C.f. Peopleware, and the impact of interruptions on productivity (and quality). Many people don't read email for stretches because they're busy working on something (or meetings, etc). Some even disconnect from the internet to focus (roc did). People have PTO, they have doctor's visits, they have p1 quantum bugs that are blocking other people - not all reviews are equal. >I think Steven MacLeod's response was highly insightful and is worth >re-reading multiple times. Pursuant to his lines of thought, I think there >is room to have more flexible policies for different components. For >example, high-value code like anything related to cryptography or security >could be barred from having "r+ with nits" but low-impact, non-shipping >files like in-tree documentation could have a very low barrier to >change. Having tighter requirements for select portions of the tree would be ok. We effectively have that for things like NSS since it's imported periodically. Such a limited tightening would be very different than what's proposed. >Of course, this implies flexible tools to accommodate flexible workflows >which may imply Mozilla's continued investment in those tools (which is in >contrast to a mindset held by some that we should use tools in an >off-the-shelf configuration instead of highly tailoring them to our >workflows). I like the simplicity of one policy for everything but am not >convinced it is best. I'd like more details on what other large companies >do here. > >I hope this post doesn't revive this thread and apologize in advance if it >does. Please consider replying privately instead of poking the bear. I did consider it. But I can't be quiet publicly with a posting stating "this *will* happen" (emphasis added) on the table. I certainly know of the vulnerabilities here but I also see how much friction for our development cycle might be caused, with little real-world benefit (IMHO). The last thing we need is to move a lot slower. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Fwd: Changes to code review tools and processes
>This was posted on mozilla.tools yesterday but doesn't seem to have made it >to dev.platform. It is likely to be of interest to many people on this >list. Note the comment "If you have questions or concerns beyond those >addressed below, please >use the mozilla.tools group and/or #phabricator on IRC and Slack to raise >them. " in the message I tried cross-posting my mozilla.tools response to here, but it appears to not work. I had extensive comments in mozilla.tools about a number of issues; I strongly advise anyone even slightly interested go there and read them. Most important were issues around who trialed it, security bug/patch access, control of data, archiving, ability to look at old patches as more than a raw diff, and where review comments live. (And more). (Mark responded, and I followed up with more comments) They do plan to put up a Wiki page on all this soon. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Is there a way to improve partial compilation times?
>On 07/03/17 20:29, zbranie...@mozilla.com wrote: > >> I was just wondering if really two days of patches landing in Gecko should >> result >> in what seems like basically full rebuild. >> >> A clean build takes 65-70, a rebuild after two days of patches takes >> 50-60min. > >That seems pretty normal to me nowadays. My very vague impression is that >this has gotten worse in the past few months. Nowadays I assume that every >m-i to m-c sync (twice a day) will entail more or less a full rebuild. someone hits Clobber, or someone touches an MFBT or xpcom header, and poof, full rebuild. >I would also say that, if your machine takes 65-70 mins for a clean build, >and that's on Linux, you need a faster machine. I reckon on about 23 mins >for gcc6 "-g -Og", and clang can do it in just about 20. Linking is also >a bottleneck -- you really need 16GB for sane linking performance. Right. My 5.5 year old desktop Linux machine (admittedly 1 (old) XEON, 16GB, SSD) does builds in around 25ish min (up from about 12 a couple of years ago!) I suspect the issue is the laptop has a lower-HP/MHz CPU (maybe), and/or the cooling solution is causing thermal throttling - mobile CPUs often can't run all-cores-flat-out for long in typical laptop cooling. Windows builds slower. Of course. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Sandboxing plans for Media, WebRTC and Networking
Since some discussions at the all-hands around audio sandboxing and mtransport's blocking linux sandbox tightening, and discussions with the necko team, I decided to explore the options available to us, before we got too far down a path-of-least-resistance. The following is the result of that analysis and discussions with people involved (Media/WebRTC/Networking/Sandboxing teams). The full document with details on the pluses and minuses of each option is here: https://docs.google.com/document/d/1cwc153l1Vo6CDuzCf7M7WbfFyHLqOcPq3JMwwYuJMRQ Randell Jesup Media, WebRTC and Network Sandboxing plans This document is meant to lay out the options for Media and Webrtc sandboxing, as well as options for Necko. Changes need to made here in order to tighten the Content sandboxes further, in particular for Audio input/output and because some mtransport code (ICE/TURN) uses OS calls for IP/interface discovery that we want to lock down. Before we start making changes, we should vet the design and determine if it makes sense to move more code & where the moved code should live - Master (“Chrome Process” in MDN docs), Content, or a separate sandbox. Conclusion: I recommend two new sandboxes in the short term (option 4.5 below) - one for Audio in/out and Video in, one for mtransport. (We combine with the audio/video one if the sandboxes cause too much (memory, startup) overhead.) For later stages, we should first consider sandboxing Necko protocols (especially new/external ones) ala GMP, and finally we should consider adding to the media sandbox MSG, WebAudio, gUM, and PeerConnection/pipeline code. That last would be a much larger change, though with some significant architectural security advantages potentially. The current state: * Video input lives in Master (CamerasParent) and talks to CamerasChild in Content * Video display is handled by Compositor (via independent sandbox and IPC channel) * Audio input and output are handled in Content - both by cubeb in full_duplex, and output by cubeb and input by webrtc.org code for non-full-duplex. * WebRTC networking code (media/mtransport) handles ICE/TURN and routing RTP/RTCP over the ICE/TURN channels via IPC, using UDPSocket/TCPSocket code with packet filters. * mtransport talks to PeerConnections and JSEP/signaling * Some IP/interface discovery code in mtransport makes OS calls from Content * PeerConnections run in the Content code * WebRTC codecs run there too, except for things like OpenH264 (in GMP sandbox) * MediaDecoders run in Content (jya has made some changes here on some platforms; DXVA decoders run in the GPU process) * Codecs run in Content, GPU , or in GMP sandboxes (especially EME) * Mac may do similarly in the future * MediaStreamGraph (MSG) runs in each Content process, as needed. In some cases if used by chrome code it will run in Master as well. * MSG runs either off a timer (if there’s no sound output) or off of audio driver callbacks on OS driver threads (with sound output). * There can be multiple MSGs in the same process (for example for disconnected non-realtime MSGs used for WebAudio, and in the future multiple MSGs to handle output device selection) * WebAudio runs as part of MSG * Necko code (used by HTTP/etc and by mtransport for low-level IO) runs in Master This includes protocols, encryption, cache and many other bits Needs: * For sandboxing work, we want to lock down Content process such that audio must move out of Content. * Deadlines for output data must be met to avoid underruns; tougher though doable when the request must transit IPC * mtransport needs to stop making OS calls from Content Wants: * Minimize complexity * IPC adds complexity, especially when anything is synchronous. * Complexity adds bugs * Complexity slows down maintenance and makes code harder to understand * Security * A sandbox is more secure than the Master process * Especially for code that touches arbitrary data from outside - i.e. codec data (encode (can come from canvas) or especially decode), and network data (packets can contain most anything if it gets by the packet filter, which is mostly about origin). * Complex code (codecs) often has hidden bugs, and fuzzing helps but doesn’t guarantee holes don’t exist. * ICE/TURN is pretty complex networking code, and a good chunk of it is in legacy C (no refcounting/etc). * Developing new wireline protocols (such as QUIC) in userspace adds new risks, especially as fixes and improvements will be rapidly iterated on, and they’re exposed to crafted packets. * Failure of the code in a sandbox doesn’t give up the whole system * Firewalling (sandboxing) vulnerable media/networking code from Content is also useful, since a break of the media/networking code would require a sandbox escape to get access to raw content (passwords, bank data, etc). * This is part of why GMP is used (though not the only reason). * Running imported code in a sandbox, e
Re: Intent to implement and ship: RTCDTMFSender
>>David, do you know whether Edge is generally planning to implement this too >>as they implement the WebRTC API? What about Safari? > >Edge is generally onboard with the DMTF API from what I know, though I >haven't discussed specifics with them. We did talk about it a little at >Cluecon (Freeswitch conference) two weeks ago, in a panel discussion. Also, the MS guys indicated they will be launching Skype over WebRTC (in alpha now), and Skype For Business (aka formerly Lync) via WebRTC soon. Skype and especially Skype For Business need DTMF support, and they would like to see the API in Firefox. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to implement and ship: RTCDTMFSender
>On 8/26/16 8:30 AM, Anne van Kesteren wrote: >I filed a few more. Mostly editorial, but >https://github.com/w3c/webrtc-pc/issues/775 is substantive. Thanks. We'll get that dealt with. >David, do you know whether Edge is generally planning to implement this too >as they implement the WebRTC API? What about Safari? Edge is generally onboard with the DMTF API from what I know, though I haven't discussed specifics with them. We did talk about it a little at Cluecon (Freeswitch conference) two weeks ago, in a panel discussion. MS guys were in remotely using Edge, connecting via webrtc to a Freeswitch conference (which uses adapter.js). (Video in Edge (H264 only so far) requires flipping a flag in about:flags in the anniversary release of win10.) We had someone in via Safari (using a plugin for webrtc) as well. Freeswitch developers are very interested in DTMF support since they're generally PBX/IP-phones and SIP trunking/etc people. Apple of course never says anything, but they're clearly working on WebRTC support. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Rationalising Linux audio backend support
>I just landed some telemetry to measure the usage of all audio backends, >we'll have data soon. > >This was bug 1280630, and the probe is at [0]. This also measures >failures to open a stream and usage of backends that should not be used >on certain platform, like winmm on windows vista+. > >Also I support this proposal. We have some data now; in Aurora 50 it's 96.5%/3.5%, Nightly 51 is 98%/2% Of course, even Aurora users aren't necessarily "typical" users, but I imagine these will not move much in the favor of Alsa in Beta or release, and more likely move towards Pulse ("users" are more likely to be running plain-jane distros which have Pulse enabled by default - but we'll see). We'll start getting beta results in a few weeks. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: realtime audio on linux
>On 07/14/2016 05:36 PM, ajo...@mozilla.com wrote: >Last I knew, JACK was the only way to get basically no dropouts and still >be able to do nontrivial audio processing. But a JACK backend for the >browser just seems kind of silly; it's too much of a niche "market" to try >for anytime soon. Likely so, though I'll note that Jack support for cubeb + Firefox has recently been updated by a contributor; so you can build with a Jack backend (or so I understand; I haven't tried it). Padenot or achronop can tell you more about this. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Checkin-needed requests - Please include complete information in the commit message :)
>On Mon, Jul 11, 2016 at 2:18 PM, Xidorn Quan wrote: >> I also use checkin-needed for small changes which I don't think it's >> worth to run a full testset for, to save some infra resources. > >Hmm, that's an odd optimization. I'd have thought that sheriff time >is more valuable than infra. It's not just the the value of sheriff time vs infra cost, it's the delta cost to other developers who end up waiting longer in Try/etc queues before *they* can land. When you fairly often see 4/6/12 or ever 18/24 hour turnarounds on Try, anything that noticably ups the load on infra when there's a low chance of catching something is a bad trade. Of course, this depends on the devs being smart about what they skip try/autoland on (for example, nit-fix changes are a good choice - usually). -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Shutdown leak detection - is there a better way?
I wrote the following in bug 1270308 comment 14, and thought it might be worth mentioning/discussing here. Not a new observation, of course, but perhaps worth thinking about. I should re-emphasize the effort that goes into making clean shutdown work without intermittent is a LOT of work for a lot of people. That said, any radical change here is also a lot of work, at least to change over, but it could save time in the long run, and might incrementally improve security. Randell (In reply to Andrew McCreight [:mccr8] from comment #13) > (In reply to Jan-Ivar Bruaroey [:jib] from comment #12) > > Aren't there other ways we could find runtime leaks? > > Unfortunately, there are none that I know of. In order to check for leaks, > you have to know whether every live allocation should be alive or not. > Waiting for shutdown and declaring that nothing should be alive is the > easiest way to do this. Right, though there are a couple of ways to do that. One is what we're (mostly) doing, which is to unwind everything and free, then see what's left. The other way (which we do with a few bits) is to mark allocations or classes are being ok to exist at exit (some of Mutexes for example IIRC). Big advantage there is cleanup/unwind code has risks of it's own and can introduce race conditions. Downside of this is that the analysis of leaks is trickier; you end up needing basically a GC scanner to see of allocations are tied to a "ok to leak" root. That said... overall I think that's a better/safer/overall-lower-effort (at least ongoing) than the complexity of the unwind code, just to avoid leaks. Also: real leaks we care about are either large single allocations, or allocations that can increase in number the more times you do something. (Which includes leaks-occasionally-due-race). One-off leaks (leak of a single pre-process instance of something) are at most an annoyance unless large or eating CPU. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Firefox Hello new data collection
>The privacy review bug is >https://bugzilla.mozilla.org/show_bug.cgi?id=1261467. >More details added below. >> On 04/04/2016 10:01, Romain Testard wrote: >> >>> We would use a whitelist client-side to only collect domains that are >>> part of the top 2000 domains (Alexa list of top domains). This >>> prevents >>> personal identification based on obscure domain usage. >>> >> >> Mathematically, the combination of a set of (popular) domains shared could >> still be uniquely identifying, especially as, AIUI, you will get the counts >> of each domain and in what sequence they were visited / which ones were >> visited in which session. It all depends on the number of unique users and >> the number of domains they visit / share (not clear: see above). Because >> the total number of Hello users compared with the number of Firefox users >> is quite low, this still seems somewhat concerning to me. Have you tried to >> remedy this in any way? >> > >We are aggregating domain names, and are not storing session histories. >These are submitted at the end of the session, so exact timestamps of any >visit are not included. There's been a bunch of surprises over the last few years where "anonymized" data turned out to be de-anonymizable. This is the sort of data that feels like it could lead to surprises. I think this would need more looks by someone who actually understands that and where those risks come from (not me). There are added risks if you include the case of someone using our data *and* data from one or more 3rd-party sites, and that's not easy to reason about, which is why this needs careful consideration. >> Finally, I am surprised that you're sharing this 2 weeks before we're >> releasing Firefox 46. Hasn't this been tested and verified on Nightly >> and/or other channels? Why was no privacy update made at/before that time? >> > >We are shipping Hello through Go Faster. The Go Faster process allows us to >uplift directly to Beta 46 directly since we're a system add-on >(development was done about 2 weeks ago). >Firefox Hello has its own privacy notice (details here ><https://www.mozilla.org/en-US/privacy/firefox-hello/>). Since the collection is not enabled currently anywhere, how known-stable is it for beta? Having the code in a disabled state safely is one thing; having it known to be safe to turn on is another. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Exit code -11 must die
>On 2/27/2016 9:06 PM, Randell Jesup wrote: >> months until recently it popped up a bit). Note that this failure >> *never* results in a crashdump, and I've never seen it locally, just in >> Automation. > >What we do know: > > * Exit code -11 is evidence a SIGSEGV (crash). > >This I don't know, but somebody may know (+ted): > > * Are we sure that the crash is happening in firefox.exe? Or is it > possible that some other process is crashing and taking down our > test harness with it? > * Can somebody point to exactly what line of code in the test harness > collects the -11 code? See Andrew's comments; I don't know > * Is there no crash dump because the crash reporter is turned off? > o If it's turned on, is the crash reporter itself crashing? Or is > the test harness not picking up the crash dump? I doubt it's turned off; but it may for some reason be unable to catch it. >> We *need* to find some solution to it -- even if it's to decide it's a >> (safe) artifact of some underlying problem outside of our control. > >Is "we" you? Are you asking somebody else to help you with this, or own the >problem completely? I need help here, or preferably someone to own this, as I'm out of my area of expertise trying to get stuff to run and debug it on the Try VMs. I have a loaner; my (very painful) attempts to run the same tests with same packages there haven't reproed it. Ryan will confirm that this has been messing with us all over the tree heavily for at least a year. It may be that they're all valid Shutdown bugs that get tagged to "whichever test ran last" - if so fine, but we need *some* way to get useful info out about what/why/where it crashed. >> I'd far rather find a true cause and either fix or wallpaper it. But right >> now it's stopping me from landing some important code changes. >> >> On the plus side, I have a nice Try run which will cause it 100% of the >> time - though when I tried to provoke it on a loaner Test VM after >> painfully emulating what's needed to run tests, it wouldn't fail -- but >> I don't trust that was a well-setup recreation of a real Try run. >> >> https://treeherder.mozilla.org/#/jobs?repo=try&revision=b2eb01359621 >> >IIRC, there was recently a post about how you can submit a try job and have >the VM stay alive afterwards for postmortem debugging. I don't >remember/can't find the details right now I think that's only for TaskCluster jobs; these aren't. >Can we also submit a try job with rr enabled, and get a recording of the >failure? That combination could lead to a pretty quick cause diagnosis of >this, since it's Linux. I'd love if we could do that, but rr means I *must* get into the machine or a clone thereof to run rr replay. >Also, does this failure happen if you disable all the tests except for the >one which is permafailing >(dom/media/tests/mochitest/identity/test_setIdentityProviderWithErrors.html)? >If so, that would make it easier to record and debug. I've tried disabling the failing test, and it just moves to a previous test, and disabled that and it moved again. It is most certainly permafailing, but it's also clearly touchy timing-wise, so disabling too much may make it disappear. May be worth trying, but has a really long cycle time. Thanks! -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Exit code -11 must die
We have a raft of intermittent oranges where we get "exit code -11". I'm trying to land a patch, which is causing a largely-unrelated set of tests (identity) to generate the infamous exit code -11 problem (there's an existing intermittent filed against that test; it had disappeared for months until recently it popped up a bit). Note that this failure *never* results in a crashdump, and I've never seen it locally, just in Automation. My changes apparently tweak the timings just a little, and whichever test runs last in the identity suite permafails - but only on linux64 opt non-e10s (at lower frequency in linux32-opt-non-e10s). Adding a patch which changes the order of some releases and what waits on what causes it to go from permafail to 10-25%. However, this error is not confined to the code I'm playing with. Currently a simple search for exit code yields ~75ish open intermittents that mention exit code -11: http://mzl.la/1TGByje We need to either get a handle on these, or decide we don't care. These have been widespread in the tree for a year now. There are many more closed as WorksForMe since the test that gets 'tagged' for the failure seems to not have much to do with the cause. Either it's something about channel/etc shutdown (like mine appears to be), or it's a landmine, or it's just timing. Often it will tag a test for a while, then disappear/move to another test and perhaps not come back (in that test; eventually people close the bug as WFM). We *need* to find some solution to it -- even if it's to decide it's a (safe) artifact of some underlying problem outside of our control. I'd far rather find a true cause and either fix or wallpaper it. But right now it's stopping me from landing some important code changes. On the plus side, I have a nice Try run which will cause it 100% of the time - though when I tried to provoke it on a loaner Test VM after painfully emulating what's needed to run tests, it wouldn't fail -- but I don't trust that was a well-setup recreation of a real Try run. https://treeherder.mozilla.org/#/jobs?repo=try&revision=b2eb01359621 -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Presto: Comparing Firefox performance with other browsers (and e10s with non-e10s)
>TL;DR - Firefox does pretty well when compared to Chrome. Excellent! Cool stats! >We've begun this project by doing a set of performance runs on >WebPageTest.com in order to compare Firefox's performance against Chrome on >the home pages of the Alexa 100 (the top 100 web sites worldwide). We've >made a visualization tool which plots all of the runs on a graph, so we may >easily compare them: > >http://valenting.github.io/presto-plot/plot.html > >- click on a domain link to see results for each site. >- you are able to view the results sorted or unsorted, or filter by the >browser version >- you can also compare metrics other than load time - such as speed >Index, number of DNS queries, etc >- you can also compare the browsers on several connectivity profiles >(Cable, 3G, Dialup) >- You can click on each individual point to go to the WPT run and view >the results in greater detail What does "run index" mean in the graphs? The values appear to be sorted from best to worst; so it's comparing best to best, next-best to next-best, etc? Ah, and I see "sorted" undoes the sorting. I'd think displaying mean/median and std-deviation (or a bell-curve-ish) might be easier to understand. But I'm no statistician. :-) It also likely is easier to read when the numbers of samples don't match (or you need to stretch them all to the same "width"; using a bell-curve plot of median/mean/std-dev avoids that problem. Thumbnails, or columns on the right for each selected browser with median (or mean), with the best (for that site) in green, the worst in red would allow eyeballing the results and finding interesting differences without clicking on 100 links... (please!) Or to avoid overloading the page, one page with graphs like today, another with the columns I indicated (where clicking on the row takes you to the graph page for that side). >-- >Error sources: > >Websites may return different content depending on the UA string. While >this optimization makes sense for a lot of websites, in this situation it >is difficult to determine if the browser's performance or the website's >optimizations have more impact on the page load. Might be interesting to force our UA for a series of tests to match theirs, or vice-versa, just to check which sites appear to care (and then mark them). Great! It'll be interesting to track how these change over time as well (or as versions get added to the list). Again, medians/means/etc may help with evaluating and tracking this (or automating notices, ala Talos) -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: nsThread now leaks runnables if dispatch fails
>> In bug 1218297 we saw a case where dispatch to a thread (the socket >> transport service thread in this case) fails because the thread has >> already shut down. In our brave new world, nsThread simply leaks the >> runnable. It can't release the reference it holds, because that would >> reintroduce the race condition we wanted to avoid, and it can't >> release the reference on the correct thread so it's already gone away. >> > >I am a bit confused with this, if the executing thread has already shut >down, why would releasing the reference dispatching thread holds >reintroduce the race condition? Who is racing with dispatching thread? It doesn't re-introduce a race. What it may do is cause the same effect as a lost race, whereby an object meant to be released on the target thread gets released on the sending thread. If the object is a non-threadsafe object (say, for example, a JS success or failure callback object), then releasing it on the sending thread is Bad. This is the same thing that can happen in a race where you have: What used to happen: { RefPtr bar = new FooRunnable(nonthreadsafe_object); target_thread->Dispatch(bar, ...); // Note: Dispatch in this (old-style) case takes a new reference to // bar. And the runnable may run and release that reference before // Dispatch() returns! And if Dispatch() fails, you always 'lose' // the race. } // bar drops the reference. If the runnable ran already, this is // the last reference and it gets destroyed here. If it hasn't run, // it gets destroyed on the target thread. New way (already_AddRefed<>): { RefPtr bar = new FooRunnable(nonthreadsafe_object); target_thread->Dispatch(bar.forget(), ...); // bar is always destroyed on the target thread - unless // Dispatch() fails, in which case it leaks (for safety). } or: other_thread->Dispatch(do_AddRef(new FooRunnable(nonthreadsafe_object)), ...); If you use the old-style (raw ptr Dispatch) as a lot of the tree still does, it will now leak on Dispatch() failure instead of possibly wrong-thread releasing: nsThread.h: nsresult Dispatch(nsIRunnable* aEvent, uint32_t aFlags) { return Dispatch(nsCOMPtr(aEvent).forget(), aFlags); } Since Dispatch(already_AddRefed<>,...) leaks on failure, this means that Dispatch(raw_ptr,...) now leaks on failure (which is what engendered this discussion). You can override this behavior with a check of the result of Dispatch() and a manual Release if it fails (and you know the object is entirely threadsafe). We could add DispatchFallible() (DispatchNeverLeaks()?) if that's a wide-enough pattern; part of my question in that case is "did the caller really expect Dispatch to possibly fail, and what implications does such a failure have to the rest of the system?" -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: nsThread now leaks runnables if dispatch fails
oMainthread; this has forced a number of tricks or bits of code to handle Dispatch failure. Once upon a time we didn't do very much with other threads, and also didn't pass JS references around, etc. We use threads a lot more and in much more complex ways now. As Nathan said in the bug: > Well, the runnables themselves are certainly safe to refcount > anywhere, but the contents of the runnables might not be... Sec bugs and crashes are worse than leaks, and they can occur if we have non-testable random off-thread releases, which is what we had, and still do for things not switched to already_AddRefed<>. I'd be good with flagging/asserting on Dispatch failures *before* shutdown begins as a step to making it infallible, since those have much more potential to be logic flaws and/or create accumulating leaks. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Disabling C++ tests by default?
>On Thu, Oct 1, 2015 at 10:25 PM, Mike Hommey wrote: > >> On Thu, Oct 01, 2015 at 10:10:39PM -0700, Gregory Szorc wrote: >> > Currently, the Firefox build system builds C++ tests by >> > default. This adds extra time to builds for something that a >> > significant chunk of developers don't care about because they don't >> > run them. >> > >> > An easy way to get faster builds is to disable C++ tests by default >> > (requiring a mozconfig entry to enable them - which would be enabled in >> > automation, of course). A more involved solution is to build them on >> > demand, when tests are executed (unless a build option says to build them >> > like today). >> > >> > I was curious if Gecko developers (the audience I perceive that >> > cares about them the most) would be generally opposed to disabling >> > building C++ tests by default. If not, we can go with an easy >> > solution (and require people who care to add a mozconfig entry). If >> > so, we go with the harder solution. >> > >> > Is disabling building C++ tests by default a reasonable change? >> >> I don't think it is worth the effort. Specifically, we're not really >> equiped to actually do it currently, and all in all, it's not taking that >> much of the build time relative to all the other stuff. The one C++ test >> that has a significant build time impact is gtest, and we're already >> skipping it. This is a relevant comment. >> I'm more interested in growing the faster-make backend to deal with C++ >> compilation in a sane way than working around the recursive-make >> deficiencies like this. > >+10. > >When I do build Gecko, I need the C++ unit tests, so this wouldn't save >me time at all. Are these tests being built every time someone does an >incremental build or just when someone changes something in the test? They're being *built* each time if anything they depend on changes, I believe, on every incremental build. Ones that don't depend on anything that changed don't seem to be built. Most devs who build/rebuild firefox will not run the cpp unittests from that build. Even those of us who do use them don't generally do it on each build cycle for testing something locally; often it's more of a pre-checkin pass. So disabling (or deferring) building them locally on "./mach build" could be a small win for 99% of developers - though a very small one usually; rebuilding the signaling unit tests after a file they depend on changed (5 separate unittests) appears to have taken around 0.05 seconds. If I touch all the cpp files for the webrtc/mtransport tests (rare outside of clobber), it adds about 1.5-2 seconds (local 4-year-old XEON desktop linux, building >20 unittests). For a clobber build rebuilding all the cpp unittests for the tree could be noticable however. Because the win is small, it shouldn't be a priority, but requiring something like --enable-cpp-tests should be ok to do (and of course automation would do that) if/when someone finds time. >If the latter, then this seems like it's of pretty small value since >most builds are incremental. If the former, then this seems to point >to a problem in the build system. It appears to be the latter; I just poked something in editor and it didn't rebuild any of the cpp unittests. >Reading Sylvestre's mail, it seems like there is some thought to >disable them on at least some automation? I would be strongly opposed >to this. C++ unit tests are the main test suite for WebRTC, so this >would be bad. Sylvestre seems to be thinking about moving them to a separate suite, and disabling *running* them in every cycle of automation (which wasn't the request above). Also, they are already a separate testsuite (cppunit) IIRC; they no longer run as part of the build itself. We very much need our cppunit tests to get run when things that might break them change. That most certainly isn't everything, note. One smart thing that *could* be done for automation (and save a *bunch* of VM/tester time) would be to not re-run cppunit tests that didn't get rebuilt - since inbound/etc do incremental builds by default, many checkins rebuild few or none of the cppunit tests, so running the same executable again has limited value (not none, but limited). -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to ship: Canvas CaptureStream
>On Fri, Sep 18, 2015 at 9:02 PM, Ms2ger wrote: > >> That bug does not seem to support the claim that they intend to ship >> this feature. No intent-to-ship is mentioned, and it appears the >> intent-to-implement has not been sent. >> > >Here's evidence they intend to implement: >https://docs.google.com/document/d/1JmWfOtUP6ZqsYJ--U8y0OtHkBt-VyjX4N-JqIjb1t78/edit#heading=h.2ki8zk71f7w2 The working group originally gave broad consensus to this a year ago at an interim with Google and Microsoft in agreement there. We just held another interim and covered this, with focus on nits in the wordings and a few surprises we found in implementing it which needed minor adjustments. Again, Google and Microsoft were directly involved, and there was no opposition to moving forward and finishing it. Everyone I've talked to in the WG involved expects it to be widely implemented (and I believe Edge will be implementing it; MS has been involved all along, and have recently started doing a lot of implementation work related to WebRTC including VP8 and Opus support. They're very clearly focused on using WebRTC/ORTC to implement conferencing solutions (i.e. Lync and Skype), and this feature helps with that.) We showed demos of canvas.captureStream() at KrankyGeek in SF last week, to major interest from developers, and at the event fielded questions from Google engineers and Google's lead product person for WebRTC about the interaction with DRM (which is fine; it can't be used to break EME). I think it's clear that Google intends to implement and ship it, and I believe Microsoft will do likewise in Edge. It adds a ton of power to WebRTC and really (along with the associated mediaElement.captureStream/etc) fills out the "ecosystem' of media routing and ability to transform it (as we can with WebAudio & MediaStreams). -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Announcing the Content Performance program
>> I was under the impression that because e10s is only a single process for >all content (at least right now) a background tab can still negatively >affect the foreground tab. > >That's right, but we also tested e10s in the process-per-tab configuration There are other options for background tab effects on foreground, though all have some sort of impact. (For example, you could 'freeze' background tabs more aggressively than we currently do. This may have some impact on functionality - we already limit timeouts, but IIRC we don't turn them off entirely, and other events still occur that we could defer until the tab is touched or ignore - but again, that could cause sites to break or make switching back problematic/annoying. However, I'm out of touch with the exact state of our 'freezing' of background tabs. And there are certainly sites where they'll eat your performance (eventually) unless you block everything. >> Have we ever considered building something like the unload tab addon into >the platform or Firefox directly? > >We have talked about it (BarTab Heavy is another example) and the code that >Yoric wrote for measuring per-compartment main-thread CPU-usage in >about:performance could be used for this. It's unclear how to prioritize it >though because doing 100% reliable heavy tab detection will require >non-trivial effort (see issues with slow-addon info bar) and the background >tab problem will mostly be mitigated by process-per-tab e10s. The problem with auto-unloaders is that you don't want that for some sites - real context or data can be lost. Yes, you can let users white-list things, but only (some) power users would even realize that this is an option or needed. I think serious care needs to be given when thinking of this as a default (or even as a particularly easy option). I think flagging sites as causing slowdown/etc is a great thing and good first step, which puts the control in the user's hands. A nice fire animation on the tab perhaps? ;-) Seems we used to have those in tinderbox >> https://addons.mozilla.org/en-US/firefox/addon/unloadtab/ >> >> """ >> An essential add-on for power users that have many tabs open. Curb your >> resource usage by unloading tabs that you haven't visited for a while. An >> unloaded tab is restored back to its original state when you need it again. >> """ "original state" likely doesn't include sites that dynamically modify themselves, which is a lot of sites. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: SpiderMonkey and XPConnect style changing from |T *p| to |T* p|
>On 3/30/15 6:08 PM, Kan-Ru Chen (陳侃如) wrote: >>>> >>Is there an hg or git equivalent to Perforce's "Time-Lapse View" tool? It >>>> >>shows a global view of a file's blame history and you can visually >>>> >>"scrub" >>>> >>through the blame history. >>>> >> >>> > >>> >You mean like git log -p FILE? I'm sure there are fancier graphic ways to >>> >visualize it, but that's the way I usually scrub my way back through >>> >restyles etc. >> I think Chris might want to follow the history of a particular line, >> that is usually what we use flame for. I find |git gui blame | >> very useful. > >Just to clarify what I meant by "visually scrub through blame history", >here is a video demo of Perforce's GUI: > >https://youtu.be/r1Zk5IaHZBQ?t=1m4s *WANT* :-) Seriously, is anything close to this available for hg or git? -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Replacing PR_LOG levels
>On Sat, May 23, 2015 at 4:46 AM, Randell Jesup wrote: >> This is used extensively in WebRTC and related media bits to enable >> *huge* amounts of debugs (like every-frame debugs for audio or video or >> per-network-packet debugs, which will swamp a system normally), and since >> people are used to enabling "debug" on random modules (or all:5), having >> verbose debugs in the "normal" :5 setting will cause pain. > >Can this be controlled some other way? Via a #define? Via another >environment variable? Definitely *not* a #define. That makes it close-to-useless. >I ask because in discussions with Eric I've been encouraging him to >reduce the number of logging levels as much as possible. It would be >sad to have to complicate the general mechanism because one module >does things differently to the rest of the system. My point (as repeated here in other posts) is that you're reducing complexity (in number of log levels) by adding complexity elsewhere (and more total complexity, IMHO). I'd consider your suggestion above an example of "complicating the general mechanism", but doing it indirectly so it can *appear* to be simple (but isn't really). This isn't a single module using values above DEBUG - this is *all* over the media modules in different debug tags. Audio, MediaDecoder, MediaStreamGraph, MediaManager, WebM, etc. Also, the current patch misses a number of uses like MediaEngine: #define LOG(msg) PR_LOG(GetMediaManagerLog(), PR_LOG_DEBUG, msg) #define LOGFRAME(msg) PR_LOG(GetMediaManagerLog(), 6, msg) (yes, this uses 6, such that the common habit of blah:5 logging doesn't turn on the spam - you have to *want* it) I like fixing up logging! I just think shoehorning a level *under* DEBUG confuses things, while adding a level *above* DEBUG would let up standardize/regularize current practice (and avoid the (temporary) "which version am I using" problem.) /me stops arguing having said his piece -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Replacing PR_LOG levels
>Thanks Adam, these are good points. > >On Friday, May 22, 2015 at 2:08:33 PM UTC-7, Adam Roach wrote: >> On 5/22/15 15:51, Eric Rahm wrote: >> > I agree, we shouldn't make it harder to turn on logging. The easy >> > solution is just to add a separate logger for verbose messages (if we >> > choose not to add Verbose/Trace). >> >> I don't know why we wouldn't just add a more verbose log level (Verbose, >> Trace... I don't care what we call it). The presence of "DEBUG + 1" in >> our code is evidence of a clear, actual need. > >This was my thought as well initially (my first proposal had a Verbose >level). After auditing the code I changed my mind. > >I believe adding the Info level will serve this need. The feeling I got >from most code that did 'DEBUG + 1' was that they needed to use DEBUG for >the informational stuff (because there was no Info level) and wanted a >lower level for the more spammy stuff (DEBUG was occupied, so fine DEBUG + >1). > >This was exemplified with the many variations of: > #define LOGI(...) PR_LOG(..., PR_LOG_DEBUG) > #define LOGV(...) PR_LOG(..., PR_LOG_DEBUG + 1) > >Note: For LOGI, to me at least, 'I' => Info >Flipside of course the 'V' => Verbose, but I think this a semantic >issue. Debug and Verbose are essentially the same thing now that we have >Info. Only if you avoid the "which module debug pattern am I in" question by switching *all* non-verbose uses of PR_LOG_DEBUG to PR_LOG_INFO (basically remap DEBUG to INFO, so that verbose logging can own the "DEBUG" level... but this seems way overkill (and liable to cause horrible ongoing confusion) compared to allowing a "PR_LOG_VERBOSE" level. PR_LOG_VERBOSE is clear and understandable. Modules where normal debugs are on INFO and you-likely-don't-want-it is on DEBUG will cause ongoing mental anguish and mismatch, and confusion for new contributors. >We can see this ambiguity in various logging frameworks, generally there's >a debug, but if there's not there's a verbose/trace, and usually not both. Though Android has both. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Replacing PR_LOG levels
>Thanks Randell, these are good points. I'll address a few of your comments >that have not already been covered in the rest of the conversation. > >> This is used extensively in WebRTC and related media bits to enable >> *huge* amounts of debugs (like every-frame debugs for audio or video or >> per-network-packet debugs, which will swamp a system normally), and since >> people are used to enabling "debug" on random modules (or all:5), having >> verbose debugs in the "normal" :5 setting will cause pain. >> > >Here lies half the problem, that number does not mean what we were led to >think it means. Using :5 would include PR_LOG_DEBUG + 1 [1]. > >> The above will also be surprising since it will work "differently" than >> other modules, making the same sorts of debugs appear at different >> levels. This would be expecially confusing to people not frequently >> working in the module or when external users are asked to turn on >> debugging. > >This assumes log levels have been used consistently across modules, I can >assure you that is not the case :) I hope that we can get to a more >consistent state of course! Sure - but having "debug" logs at DEBUG in module A, and INFO in module B (with "verbose-you-only-want-them-if-you-want-insane-data" being at DEBUG in B) will cause confusion. It means "turn on logging for X" requires knowledge of which of two patterns X follows. It also means all:4 (or all:debug) will cause suprising (and useless) results. And when coding, it'll be easy to use the "wrong" pattern when switching between modules. I see serious pain in a two-model system. And we have real *need* for logging at a makes-the-browser-almost-unusable volume to track down certain sorts of things. Even splitting say mediamanager:5 into mediamanager:4 (or :debug) and mediamanager_verbose:4 (or :debug) will have the same problem for all:. And it will make getting verbose logging more painful for no real reason. >I don't think it will be any more confusing when telling the external user >what level to turn on (it's already rather confusing). Do we need super >spammy? Turn on debug. Do we need just the basics? Turn on info. > >> Then there's an oddball: webrtc.org "Trace" logging (a separate >> subsystem with low-overhead buffered-IO circular-log maxed at 10MB), >> which requires a 16-bit mask. Currently this is exposed as >> webrtc_trace:65535 (for example), with a separate env var telling it >> where to put the logfile (or 'nspr' to revector the logs through NSPR >> logging, which *really* will cause problems with realtime code but is >> handy at times). For this oddball, we could do other things like move >> it to a separate env var (which is annoying to people using it, but not >> horribly). > >I think it's probably best to keep this a separate env rather than trying to >shoehorn it in. Understandable - but I've been trying to move in the opposite direction; avoiding N different env vars (and also, if I can avoid that, I have a better chance of leveraging on-the-fly log level changes without having to reinvent the wheel). -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Replacing PR_LOG levels
>As part of the effort to improve logging in gecko we'd like to introduce a new >set of unified log levels. >*PR_LOG_DEBUG + 1 aka log level 5* > >Various bits of code invented a log level that was less important than >debug (I would call this verbose). This was not specified in NSPR logging, >but just kind of worked. With the addition of |Info| we can transition code >that was using this pseudo log level to: > - map PR_LOG_DEBUG => Info > - map PR_LOG_DEBUG + 1 => Debug > >In this case we could have added a |Verbose| log level, but with the >addition of |Info| and feeling that fewer is better we have decided against >that. This is used extensively in WebRTC and related media bits to enable *huge* amounts of debugs (like every-frame debugs for audio or video or per-network-packet debugs, which will swamp a system normally), and since people are used to enabling "debug" on random modules (or all:5), having verbose debugs in the "normal" :5 setting will cause pain. The above will also be surprising since it will work "differently" than other modules, making the same sorts of debugs appear at different levels. This would be expecially confusing to people not frequently working in the module or when external users are asked to turn on debugging. At minimum, I'd like to keep some form of Verbose (or VerboseDebug, or what have you) to avoid this "surprise" factor. The alternative would be to add more log types -- split signaling:6 into signaling:debug,signaling_verbose:debug for example. Somewhat annoying but works. I'd prefer a 'Verbose' setting. Then there's an oddball: webrtc.org "Trace" logging (a separate subsystem with low-overhead buffered-IO circular-log maxed at 10MB), which requires a 16-bit mask. Currently this is exposed as webrtc_trace:65535 (for example), with a separate env var telling it where to put the logfile (or 'nspr' to revector the logs through NSPR logging, which *really* will cause problems with realtime code but is handy at times). For this oddball, we could do other things like move it to a separate env var (which is annoying to people using it, but not horribly). >*How will log levels be controlled?* > >Similar to how it works now (if not terribly clear), turning on: > - LogLevel::Error => Error messages will be printed > - LogLevel::Warning => Error and Warning messages will be printed > - LogLevel::Info=> Error, Warning, and Info messages will be printed > - LogLevel::Debug => Error, Warning, Info, and Debug messages will be > printed what's the interface to NSPR_LOG_MODULES? still with the numerics, or do we have to use NSPR_LOG_MODULES=signaling:debug,mediamanager:debug,getusermedia:debug,mtransport:debug,mediastreamgraph:debug etc? (That gets a bit old... not that numerics are "better", but they're faster to type/read and lots of existing scripts/etc have them.) Obviously one could have numbers and names. >As a future improvement we plan to add the ability to toggle this levels at >runtime. This would be wonderful! Been looking for that for years (and years). -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Runnables and thread-unsafe members
(was: Re: Proposal to ban the usage of refcounted objects inside C++ lambdas in Gecko) tl;dr: We should make DispatchToMainThread take already_AddRefed and move away from raw ptrs for Dispatches in general. So: What I want to avoid is this pattern for runnables that hold thread-restricted pointers (mostly MainThread-restricted, such as JS callbacks): FooRunnable::FooRunnable(nsCOMPtr& aFoo) { mFoo.swap(aFoo); // Never AddRef/Release off MainThread } ... { nsRefPtr foo = new FooRunnable(myFoo); // RefCnt == 1, myFoo.get() == nullptr now NS_DispatchToMainThread(foo); // infallible (soon), takes a ref to foo (refcnt 2) // XXX what is foo's refcnt here? You don't know! } The reason is that foo may go out of scope here *after* it has Run() and had the event ref dropped on mainthread. I.e.: at the XXX comment, it may "usually" be 2, but sometimes may be 1, and when foo goes out of scope we delete it here, and violate thread-safety for mFoo. Blindly taking an NS_DispatchToMainThread(new FooRunnable) and converting it to the pattern above (with nsRefPtr) will introduce a timing-based thread safety violation IF it holds thread-locked items (which isn't super-common). But also, it's extra thread-safe addref/releasing when we don't need to. A safe pattern is this: { nsRefPtr foo = new FooRunnable(myFoo); // RefCnt == 1, myFoo.get() == nullptr now // This requires adding a version of // NS_DispatchToMainThread(already_AddRefed) or some such NS_DispatchToMainThread(foo.forget()); // infallible (soon) // foo is nullptr } And to be honest, that's generally a better pattern for runnables anyways - we *want* to give them away on Dispatch()/DispatchToMainThread(). Note that you can also do ehsan's trick for safety instead: ~FooRunnable() { if (!NS_IsMainThread()) { // get mainthread NS_ProxyRelease(mainthread, mFoo); } } I don't love this though. It adds almost-never-executed code, we addref/release extra times, it could bitrot or forget to grow new mainthread-only refs. And it wastes code and adds semantic boilerplate to ignore when reading code. You could add a macro to build these and hide the boilerplate some, but that's not wonderful either. So, if I have my druthers: (And really only #1 is needed, probably) 1) Add already_AddRefed variants of Dispatch/DispatchToMainThread, and convert things as needed to .forget() (Preferably most Dispatch/DispatchToMainThreads would become this.) If the Dispatch() can fail and it's not a logic error somewhere (due to Dispatch to a doomed thread perhaps in some race conditions), then live with a leak - we can't send it there to die. DispatchToMainThread() isn't affected by this and will be infallible soon. 2) If you're building a runnable with a complex lifetime and a MainThread or thread-locked object, also add the ProxyRelease destructor. Otherwise consider simple boilerplate (for MainThread-only runnables) to do: ~FooRunnable() { if (!NS_IsMainThread()) { MOZ_CRASH(); } } (Perhaps DESTROYED_ON_MAINTHREAD_ONLY(FooRunnable)). Might not be a bad thing to have in our pockets for other reasons. 3) Consider replacing nsCOMPtr mFoo in the runnable with nsReleaseOnMainThread> mFoo and have ~nsReleaseOnMainThread() do NS_ProxyRelease, and also have nsReleaseOnMainThread block attempts to do assignments that AddRef (only set via swap or assign from already_AddRefed). Unlike the MainThreadPtrHolder this could be created on other threads since it never AddRefs; it only inherits references and releases them on MainThread I wonder if we could move to requiring already_AddRefed for DispatchToMainThread (and Dispatch?), and thus block all attempts to do DispatchToMainThread(new FooRunnable), etc. :-) -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: PSA: Network 'jank' - get your blocking IO off of STS thread!
>Can we stop exposing the socket transport service's nsIEventTarget outside >of Necko? If we move media/mtransport to necko... or make an exception for it (and dom/network/UDPSocket and TCPSocket, etc). Things that remove loaded footguns (or at least lock them down) are good. Glad the major real problem was too-similar-names (I'd never heard of STREAMTRANSPORTSERVICE (or if I had, it had been long-forgotten, or mis-read as SOCKETTRANSPORTSERVICE)). -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: FYI: Serious windows bug affecting WebRTC when mem use is over 2GB
>Force a buffer in <2GB memory and always copy into/out of that buffer? That may work, other than it's inconvenient to force a buffer into <2GB space at the time when you need it, and thus we'd have to perma-allocate a "large enough" buffer for this. (Note GetAdapters*() is typically used with a first call to get the size, then a second to get the real data, or with a 16K-ish buffer then redone if that isn't enough.) Each user of GetAdapters*() (I don't think we have any uses of GetHostByName/Address) would need to pre-allocate a buffer during startup probably (or use a static buffer). You still would have to dynamically allocate one if the static buffer wasn't big enough, and that could then fail of course. Better than turning off >2GB memory, though. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
PSA: Network 'jank' - get your blocking IO off of STS thread!
As a result of some conversations on IRC, it came to light that a bunch of blocking IO operations were apparently moved off of MainThread (presumably to avoid UI 'jank'), and instead Dispatched to STREAMTRANSPORTSERVICE (aka STS thread). Some examples pointed out to me: FilePicker, the spell-checker, the DeviceStorage DOM code, DOM cache code in Manager.cpp (via BodyStartWriteStream()), even perhaps ResolvedCallback in ServiceWorkers. (I haven't looked closely at all of the uses yet.) Good, right? No jank in the UI! No - now the jank is in the network code! STS is the primary network thread, and blocking that without a Darn Good Reason will cause all sorts of negative effects. Page loads, XHR, WebSockets, WebRTC (which will react by adding delay to cover the 'jitter' in apparent network RTT) and many other things will have small-to-large negative impacts. Perhaps some of these listed above (and others) don't block or even have a legitimate need to run on STS thread - ok, if so, comment as to why it's ok/needed. Everyone else should not be using STS thread as a convenient target It does point out that with the need to get a lot of blocking operations off of MainThread, it would help other modules to have a single service or ThreadPool for dumping such operations to. This would mean less code, less incorrect/undone thread shutdowns/etc. Thoughts? (Perhaps such a service could use LazyIdleThreads, which will shut themselves down after inactivity.) -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform