Intent to ship: Honoring bogo-XML declaration for character encoding in text/html
This has now landed and is expected to ride the trains in Firefox 89. For added historical context: Prior to HTML parsing getting specified, in addition to WebKit, also Gecko and Presto implemented this. At the time, the specification process paid too much attention to IE behavior as a presumed indicator of Web-compatibility instead of looking at engine quorum. Like WebKit, Presto kept this behavior when implementing HTML5-compliant tokenization and tree building. That is, I was the only browser implementor fooled into removing this behavior as part of re-implementing parsing from the spec--not just the tokenization and tree building layers but also the input stream layer. What can we learn? Instead of trusting the spec and trusting other implementors to loudly object to the parts of the spec they don't intend to follow, proactively check what the others are doing and adjust sooner. On Wed, Mar 10, 2021 at 5:56 PM Henri Sivonen wrote: > > # Summary > > For compatibility with WebKit and Blink, honor the character encoding > declared using the XML declaration syntax in text/html. > > For reasons explained in https://hsivonen.fi/utf-8-detection/ , unlike > other encodings, UTF-8 isn't detected from content, so with the demise > of Trident and EdgeHTML (which don't honor the XML declaration syntax > in text/html), has become a > more notable Web compat problem for us. With non-Latin scripts, the > failure mode is particularly bad for a Web compat problem: The text is > completely unreadable. > > That is, this isn't a feature for Web authors to use. This is to > address a push factor for users when authors do use this feature. > > # Bug > > https://bugzilla.mozilla.org/show_bug.cgi?id=673087 > > # Standard > > https://github.com/whatwg/html/pull/1752 > > # Platform coverage > > All > > # Preference > > To be enabled unconditionally. > > # DevTools bug > > No integration needed. > > # Other browsers > > WebKit has had this behavior for a very long time and didn't remove it > when HTML parsing was standardized. > > Blink inherited this from WebKit upon forking. > > Trident and EdgeHTML don't have this; their demise changed the balance > for this feature. > > # web-platform-tests > > https://hsivonen.com/test/moz/xml-decl/ contains tests which are > wrapped for WPT as part of the Gecko patch. > > -- > Henri Sivonen > hsivo...@mozilla.com -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Intent to prototype: Honoring bogo-XML declaration for character encoding in text/html
# Summary For compatibility with WebKit and Blink, honor the character encoding declared using the XML declaration syntax in text/html. For reasons explained in https://hsivonen.fi/utf-8-detection/ , unlike other encodings, UTF-8 isn't detected from content, so with the demise of Trident and EdgeHTML (which don't honor the XML declaration syntax in text/html), has become a more notable Web compat problem for us. With non-Latin scripts, the failure mode is particularly bad for a Web compat problem: The text is completely unreadable. That is, this isn't a feature for Web authors to use. This is to address a push factor for users when authors do use this feature. # Bug https://bugzilla.mozilla.org/show_bug.cgi?id=673087 # Standard https://github.com/whatwg/html/pull/1752 # Platform coverage All # Preference To be enabled unconditionally. # DevTools bug No integration needed. # Other browsers WebKit has had this behavior for a very long time and didn't remove it when HTML parsing was standardized. Blink inherited this from WebKit upon forking. Trident and EdgeHTML don't have this; their demise changed the balance for this feature. # web-platform-tests https://hsivonen.com/test/moz/xml-decl/ contains tests which are wrapped for WPT as part of the Gecko patch. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: User-facing benefits from UA exposure of Android version and Linux CPU architecture
On Thu, Feb 18, 2021 at 11:26 PM Mike Hommey wrote: > > On Thu, Feb 18, 2021 at 01:51:07PM +0200, Henri Sivonen wrote: > > Does reporting "Linux aarch64" have significant concrete benefits to > > users? Would actual presently-existing app download pages break if, > > for privacy, we always reported "Linux x86_64" on Linux regardless of > > the actual CPU architecture (or reported it on anything but 32-bit > > x86)? > > Would not exposing the CPU architecture be an option? Are UA sniffers > expecting the UA format to include the CPU architecture? In general, changing the format of the UA string is always riskier than freezing parts to some value that has been common in the past. I think finding out whether removal would be Web compatible is not worth the risk, churn, cost, and time investment. The attempt to take away the Gecko date was a costly episode of churn that left us with weird divergence between the desktop and mobile Gecko tokens. As far as removals from the UA string go, the main success is the removal of the crypto level token, but that removed an entire between-semicolons item from the middle of the list. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
User-facing benefits from UA exposure of Android version and Linux CPU architecture
We currently expose the major version of Android and the CPU architecture on Linux in every HTTP request. That is, these are exposed to passive fingerprinting that doesn't involve running JS. (On Android, the CPU architecture is exposed via JavaScript but not in the HTTP User-Agent string. Exposing it probably doesn't help users, but the exposure probably doesn't contribute JS-reachable entropy on top of WebGL-exposed values.) Previously, we've had problems from exposing the Android version. Back when Firefox still ran on Android versions lower than 4.4, we ended up reporting 4.4 for those to avoid discrimination by Web sites. I'm aware of one use case for the Android version that could be justified from the perspective of what users might want: Deciding if intent links work. This only requires checking < 6 vs. >= 6, and Fenix runs on 5 still. Assuming that we accept the level of breakage that a device currently running Android 9 or 10 experience relative to what Android 8 devices experience (9 and 10 have potential breakage from not having a dot and a minor version), what would break if we reported 5.0 for anything below 6 and the latest (currently 10) for anything 6 or above? As for the CPU architecture on Linux, on Mac and Windows we don't expose aarch64 separately. (On Windows, consistent with Edge, aarch64 looks like x86. On Mac, aarch64 looks like x86_64 which itself doesn't differ from what x86 looked like.) The software download situation for each desktop platform is different: On Windows, an x86 stub installer can decide whether to install x86, x86_64, or aarch64 app. On Mac, Universal Binaries make it irrelevant to know the CPU architecture at the app download time. On Linux, downloads outside the distro's package manager typically involve the user having to choose from various options anyway due to tarball vs. .deb vs. .rpm vs. Flatpak vs. Snap, etc. OTOH, unlike on Windows and Mac, x86 or x86_64 emulation isn't typically automatically ready to work on Linux. We don't ship official builds for aarch64 Linux (do we have plans to?), but we do have it as an option on try and the configuration is becoming increasingly relevant in distro-shipped form. However, if we wanted to avoid the fingerprintability here, it would be good to take action before Linux on aarch64 is popular enough for us to ship official builds. Does reporting "Linux aarch64" have significant concrete benefits to users? Would actual presently-existing app download pages break if, for privacy, we always reported "Linux x86_64" on Linux regardless of the actual CPU architecture (or reported it on anything but 32-bit x86)? Historically, distros and BSDs have wanted to draw attention to themselves instead of letting their users hide in the crowd for privacy, which is unfortunate considering that the user cohort is already smaller than the Windows and Mac cohorts. Does that dynamic apply to ISAs? I.e. should we expect distro maintainers to undo the change if we made mozilla-central say "Linux x86_64" regardless of the actual ISA on Linux? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Intent to unship: Exposure of 11.x macOS versions in the User-Agent string
https://bugzilla.mozilla.org/show_bug.cgi?id=1679929 caps the macOS version exposed in the User-Agent string to 10.15. The motivation that set this in motion is the Web compat impact of sites not expecting to see a version starting with 11. The reason for not exposing Big Sur as 10.16 is that Safari capped the version and capping the version has a privacy benefit going forward while the utility of exposing the macOS version is questionable especially after Safari capped it. https://bugs.webkit.org/show_bug.cgi?id=217364 As for why not do even better for privacy and report 10.15 on older versions of macOS, capping the version is a more prudent change. Chrome will also cap the version like Safari in its UA string, but Chrome will expose the real version via Sec-CH-UA-Platform-Version, which neither Safari nor Firefox supports. (Client Hints in general and the Sec-CH-UA-* parts in particular are beyond the scope of this email.) https://groups.google.com/a/chromium.org/g/blink-dev/c/hAI4QoX6rEo/m/qQNPThr0AAAJ So far, the use case why Web developers would prefer to distinguish Safari on Catalina vs. Safari on Big Sur relates to WebP support, but the issue is moot for both Firefox and Chrome, as Firefox and Chrome advertise WebP via the Accept header and don't rely on the OS decoder. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Gecko performance with newer x86_64 levels
On Tue, Feb 9, 2021 at 5:35 PM Gian-Carlo Pascutto wrote: > > On 3/02/2021 10:51, Henri Sivonen wrote: > > I came across > > https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level/ > > . Previously, when microbenchmarking Rust code that used count_ones() > > in an inner loop (can't recall what code this was), I noticed 4x > > runtime speed when compiling for target_cpu=nehalem and running on a > > much later CPU. > > That's an extreme edge case though. It is an extreme edge case but it's also a case where run-time dispatch doesn't make sense. The interesting thing is how much these plus LLVM using newer instructions on its own would add up around the code base. > > I'm wondering: > > > > Have we done benchmark comparisons with libxul compiled for the > > newly-defined x86_64 levels? > > No. Should be easy to do In that case, it seems worth trying. > but I don't expect much to come off of it. The > main change (that is broadly applicable, unlike POPCNT) in recent years > would be AVX. Do we have much floating point code in critical paths? I > was wondering about the JS' engine usage of double for value storage - > but it's what comes out of the JIT that matters, right? AVX is much more recent than what's available after SSE2, which is our current baseline. Chrome is moving to SSE3 as the unconditional baseline, which I personally find surprising: https://docs.google.com/document/d/1QUzL4MGNqX4wiLvukUwBf6FdCL35kCDoEJTm2wMkahw/edit# A quick and very unscientific look at Searchfox suggests that unconditional SSE3 would mainly eliminate conditional/dynamic dispatch on YUV conversion code paths when it comes to explicit SSE3 usage. No idea how LLVM would insert SSE3 usage on its own. > Media codecs don't count - they should detect at runtime. Same applies > to crypto code, that - I really hope - would be using runtime detection > for their SIMD implementations or even hardware AES/SHA routines. > > > For macOS and Android, do we actively track the baseline CPU age that > > Firefox-compatible OS versions run on and adjust the compiler options > > accordingly when we drop compatibility for older OS versions? > > Android only recently added 64-bit builds, and 32-bit would be limited > to ARMv7-A. There used to be people on non-NEON devices, but those are > probably gone by now. Google says "For NDK r21 and newer Neon is enabled > by default for all API levels." - note that should be the NDK used for > 64-bit builds. > > So it's possible Android could now assume NEON even on 32-bit, if it > isn't already. Most of the code that cares (i.e. media) will already be > doing runtime detection though. I meant tracking baseline CPU age on the x86/x86_64 Android side. We have required NEON on Android ARMv7 for quite a while already. > For macOS Apple Silicon is a hard break. For macOS on x86, I guess AVX > is also breaking point. There was an open question if any non-AVX > hardware is still supported on Big Sur because Rosetta doesn't support > AVX code, but given that we support (much) older macOS releases I don't > think we can assume AVX presence regardless. We support back to macOS > 10.12, which runs on "MacBook Late 2009", which was a Core 2 Duo. Guess > we could assume SSSE3 but nothing more. That's older than I expected, but it still seems worthwhile to make our compiler settings for Mac reflect that if they don't already. Also, doesn't the whole Core 2 Duo family have SSE 4.1? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to unship: FTP protocol implementation
On Wed, Feb 10, 2021 at 10:37 AM Valentin Gosu wrote: > FTP support is currently disabled on Nightly. > Our current plan is for the pref flip to ride the trains with Firefox 88 to > beta and release [1], meaning we would be disabling FTP a week after Chrome > [2] Are we also stopping advertising the capability to act as an ftp: URL handler to operating systems? Currently, if I try to follow an ftp: URL in Gnome Terminal, it tries to launch Firefox. Is that something we advertise to Gnome or something that Gnome just knows and needs to be patched to stop knowing? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Gecko performance with newer x86_64 levels
I came across https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level/ . Previously, when microbenchmarking Rust code that used count_ones() in an inner loop (can't recall what code this was), I noticed 4x runtime speed when compiling for target_cpu=nehalem and running on a much later CPU. I'm wondering: Have we done benchmark comparisons with libxul compiled for the newly-defined x86_64 levels? How feasible would it be, considering CI cost, to compile for multiple x86_64 levels and make the Windows installer / updater pick the right one and to use the new glibc-hwcaps mechanism on Linux? For macOS and Android, do we actively track the baseline CPU age that Firefox-compatible OS versions run on and adjust the compiler options accordingly when we drop compatibility for older OS versions? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Status of Ubuntu 20.04 as a development platform
On Tue, Nov 10, 2020 at 4:39 PM James Graham wrote: > > On 10/11/2020 14:17, Kyle Huey wrote: > > On Tue, Nov 10, 2020 at 3:48 AM Henri Sivonen wrote: > >> > >> Does Ubuntu 20.04 work properly as a platform for Firefox development? > >> That is, does rr work with the provided kernel and do our tools work > >> with the provided Python versions? > > > > rr works. I use 20.04 personally. > > I've also been using 20.04 and all the Python bits have worked fine. Thanks. I upgraded, and both rr and Python-based tools work. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Enabled CRLite in Nightly
On Fri, Nov 13, 2020 at 6:19 AM J.C. Jones wrote: > Not yet, no. Neither this nor Intermediate Preloading (which CRLite depends > on) are enabled in Fenix yet, as we have outstanding bugs about "only > download this stuff when on WiFi + Power" and "that, but configurable." If the delta updates are averaging 66 KB, do we really need to avoid the updates over cellular data even when that's assumed to be metered? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Status of Ubuntu 20.04 as a development platform
Does Ubuntu 20.04 work properly as a platform for Firefox development? That is, does rr work with the provided kernel and do our tools work with the provided Python versions? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Please don't use functions from ctype.h and strings.h
On Wed, Jun 24, 2020 at 10:35 PM Chris Peterson wrote: > > On 8/27/2018 7:00 AM, Henri Sivonen wrote: > > I think it's worthwhile to have a lint, but regexps are likely to have > > false positives, so using clang-tidy is probably better. > > > > A bug is on file:https://bugzilla.mozilla.org/show_bug.cgi?id=1485588 > > > > On Mon, Aug 27, 2018 at 4:06 PM, Tom Ritter wrote: > >> Is this something worth making a lint over? It's pretty easy to make > >> regex-based lints, e.g. > >> > >> yml-only based lint: > >> https://searchfox.org/mozilla-central/source/tools/lint/cpp-virtual-final.yml > >> > >> yml+python for slightly more complicated regexing: > >> https://searchfox.org/mozilla-central/source/tools/lint/mingw-capitalization.yml > >> https://searchfox.org/mozilla-central/source/tools/lint/cpp/mingw-capitalization.py > > > Bug 1642825 recently added a "rejected words" lint. It was intended to > warn about words like "blacklist" and "whitelist", but dangerous C > function names could easily be added to the list: > > https://searchfox.org/mozilla-central/source/tools/lint/rejected-words.yml > > A "good enough" solution that can find real bugs now is preferable to a > cleaner clang-tidy solution someday, maybe. (The clang-tidy lint bug > 1485588 was filed two years ago.) Thanks. Filed https://bugzilla.mozilla.org/show_bug.cgi?id=1648390 -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Please don't use locale-dependent C standard library functions (was: Re: Please don't use functions from ctype.h and strings.h)
This is an occasional re-reminder that anything in the C standard library that is locale-sensitive is fundamentally broken and should not be used. Today's example is strerr(), which returns a string that is meant to be rendered to the user, but the string isn't guaranteed to be UTF-8. On Mon, Aug 27, 2018 at 3:04 PM Henri Sivonen wrote: > > Please don't use the functions from ctype.h and strings.h. > > See: > https://daniel.haxx.se/blog/2018/01/30/isalnum-is-not-my-friend/ > https://daniel.haxx.se/blog/2008/10/15/strcasecmp-in-turkish/ > https://stackoverflow.com/questions/2898228/can-isdigit-legitimately-be-locale-dependent-in-c > > In addition to these being locale-sensitive, the functions from > ctype.h are defined to take (signed) int with the value space of > *unsigned* char or EOF and other argument values are Undefined > Behavior. Therefore, on platforms where char is signed, passing a char > sign-extends to int and invokes UB if the most-significant bit of the > char was set! Bug filed 15 years ago! > https://bugzilla.mozilla.org/show_bug.cgi?id=216952 (I'm not aware of > implementations doing anything surprising with this UB but there > exists precedent for *compiler* writers looking at the standard > *library* UB language and taking calls into standard library functions > as optimization-guiding assertions about the values of their > arguments, so better not risk it.) > > For isfoo(), please use mozilla::IsAsciiFoo() from mozilla/TextUtils.h. > > For tolower() and toupper(), please use ToLowerCaseASCII() and > ToUpperCaseASCII() from nsUnicharUtils.h > > For strcasecmp() and strncasecmp(), please use their nsCRT::-prefixed > versions from nsCRT.h. > > (Ideally, we should scrub these from vendored C code, too, since being > in third-party code doesn't really make the above problems go away.) > > -- > Henri Sivonen > hsivo...@mozilla.com -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Proposal: remove support for running desktop Firefox in single-process mode (e10s disabled) anywhere but in tests
On Wed, Jun 10, 2020 at 11:13 PM James Teh wrote: > In general, this obviously makes a lot of sense. However, because there is > so much extra complication for accessibility when e10s is enabled, I find > myself disabling e10s in local opt/debug builds to isolate problems to the > core a11y engine (vs the a11y e10s stuff). This is also relevant to other debugging scenarios, especially when not being able to use Pernosco to search for the right process. What does this proposal mean for ./mach run --disable-e10s ? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to unship: FTP protocol implementation
On Thu, Mar 19, 2020 at 2:24 AM Michal Novotny wrote: > We plan to remove FTP protocol implementation from our code. Chrome's status dashboard says "deprecated" and https://textslashplain.com/2019/11/04/bye-ftp-support-is-going-away/ said the plan was to turn FTP off by default in version 80. Yet, I just successfully loaded ftp://ftp.funet.fi in Chrome 80 on Mac and in Edge 82 (Canary) on Windows 10, and I'm certain I haven't touched the flag in either. (The location bar kept showing the ftp:// URL, so it doesn't appear to be a case of automatically trying HTTP.) Do we know why Chrome didn't proceed as planned? Do we know what their current plan is? Do we know if Edge intends to track Chrome on this feature or to make an effort to patch a different outcome? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to ship: Autodiscovery of WebExtension search engines
On Tue, Feb 25, 2020 at 10:04 PM Dale Harvey wrote: > Yes, extensions that only define a new search engine will be permitted, > the extension will not be able to do anything else. What capabilities do search engine-only WebExtensions have that OpenSearch doesn't provide? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to prototype: Character encoding detector
On Mon, Dec 2, 2019 at 2:42 PM Henri Sivonen wrote: > 1. On _unlabeled_ text/html and text/plain pages, autodetect _legacy_ > encoding, excluding UTF-8, for non-file: URLs and autodetect the > encoding, including UTF-8, for file: URLs. > > Elevator pitch: Chrome already did this unilaterally. The motivation > is to avoid a situation where a user switches to a Chromium-based as a > result of browsing the legacy Web or local files. Feature #1 is now on autoland. > # Preference For file: URLs, I ended up not putting the new detector behind a pref, because the file: detection code is messy enough even without alternative code paths, and I'm pretty confident that the new detector is an improvement for our file: URL handling behavior. For non-file: URLs, the new detector is overall controlled by intl.charset.detector.ng.enabled, which defaults to true, i.e. detector enabled. When the detector is enabled, various old intl.charset.* are ignored in various ways. The detector is, however, disabled by default for three TLDs: .jp, .in, and .lk. This can be overridden via the prefs intl.charset.detector.ng.jp.enabled, intl.charset.detector.ng.in.enabled, and intl.charset.detector.ng.lk.enabled all three of which default to false. (These prefs cannot enable the detector if intl.charset.detector.ng.enabled is false) In the case of .jp, the pre-existing Japanese-specific detector is used. This avoids regressing how soon we start reloading if we detect EUC-JP. The detector detects encodings that are actually part of the Web Platform. However, this can cause problems when a site expects the page to be decoded as windows-1252 _as a matter of undeclared fallback_ and expects the user to have an _intentionally mis-encoded_ font that assigns non-Latin glyphs to the windows-1252 code points. (Note that if the site says , that continues to be undisturbed: https://searchfox.org/mozilla-central/rev/62a130ba0ac80f75175e4b65536290b52391f116/parser/html/nsHtml5StreamParser.cpp#1512 ) Chrome has detection for three windows-1252-misusing Devanagari font encodings and nine Tamil ones. (Nine looks like a lot, but Python tool in this space is documented to handle 25 Tamil legacy encodings!) There is no indication that the Chrome developers found it necessary to have these detections. Actively-maintained newspaper sites that, according to old Bugzilla items, previously used these font hacks have migrated to Unicode. Rather, it looks like Chrome inherited them from Google search engine code. Still, this leaves the possibility that there are sites that presently work (if the user has the appropriate fonts installed) in Chrome thanks to this detection and in Firefox thanks to Firefox mapping the .in TLD to windows-1252 and mapping .com to windows-1252 in the English localizations as well as in the localizations for the Brahmic-script languages of India. By not enabling the new detector on .in at least for now avoids disrupting sites that intentionally misuse windows-1252 without declaring it if such sites are still used by users (at the expense of out-of-locale usage of .in as a generic TLD; data disclosed by Google as part of Chrome's detector suggest e.g. Japanese use of .in). To the extent the phenomenon of relying on intentionally misencoded fonts still exists but on .com, the new detector will likely disrupt it (likely by guessing some Cyrillic encoding). However, I think it doesn't make sense to let that possibility derail this whole project/feature. Although I believe this phenomenon to be mostly a Tamil in Tamil Nadu thing rather than a general Tamil language thing, I disabled the detector on .lk just in case to have more time to research the issue. If reports of legacy Tamil sites breaking show up, please needinfo me on Bugzilla. I didn't disable the detector for .am, because Chrome doesn't appear to have detections for Armenian intentional misuse of windows-1252. If intl.charset.detector.ng.enabled is false, Japanese detection behaves like previously, except that encoding inheritance from a same-origin parent frame now takes precedence over the detector. (This was a spec compliance bug that had previously gone unnoticed because we hadn't run the full test suite with a detector enabled. It turns out that tests both semi-intentionally and accidentally depend on same-origin inheritance taking precedence as the spec says.) In the interest of binary size, I removed the old Cyrillic detector at the same time as landing the new one. If the new detector is disabled by the old Cyrillic detector is enabled, the new detector runs in the situations where the old Cyrillic detector would have run in a mode that approximates the old Cyrillic detector. (This approximation can, however, result in some non-Cyrillic outcomes that were impossible with the old Cyrillic detector.) > # web-platform-tests I added tests as tentative WPTs. -- Henri Sivonen hsivo...@mozilla.com ___
Re: Intent to prototype: Character encoding detector
On Thu, Dec 5, 2019 at 8:08 PM Boris Zbarsky wrote: > > On 12/2/19 7:42 AM, Henri Sivonen wrote: > > Since there isn't a spec and Safari doesn't implement the feature, > > there are no cross-vendor tests. > > Could .tentative tests be created here, on the off chance that we do > create a spec for this at some point? Good point. I'll use tentative WPTs for end-to-end automated tests. Thanks. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Intent to prototype: Character encoding detector
re, my implementation relies on the browser already containing an implementation of the Encoding Standard. This cuts the binary size impact to less than one fourth compared to adopting the detector from Chrome, which doesn't benefit from any data tables that a browser already has to have anyway.) I've gone with demonstrating feasibility before further cross-vendor discussion, because this is a user retention measure in response to a unilateral move on Chrome's part and Safari on iOS doesn't face pressure from switching to browsers with a different Web engine. # Platform coverage All platforms. # Preference There will probably be one for an initial testing period, but I haven't picked a name yet. # DevTools bug There is no new DevTool surface for this. The HTML parser already complains in a DevTool-visible way about unlabeled pages, and this change will not remove those messages. # Other browsers Chromium-based browsers: Already shipping feature #1 (not shipping feature #2) IE: Off-by-default (not precisely feature #1 or #2 but a kind of combination of the two). Safari: Not shipping either feature but, like Firefox and unlike Chrome, provides a menu for addressing the use cases that feature #2 is meant to address. # web-platform-tests Since there isn't a spec and Safari doesn't implement the feature, there are no cross-vendor tests. # Secure contexts Since this pair of features is about compatibility with legacy content, both features apply to insecure contexts. # Sandboxed iframes The feature applies to sandboxed iframes. For feature #1, the feature applies only to different-origin frames and the situation is the same as for the pre-existing Japanese detection: The framer cannot turn off the feature for the framee. Both the framer or the framee can turn off the feature for itself by adhering to the HTML authoring conformance requirements, i.e. by declaring its own encoding. For feature #2, the situation is the same as for the pre-existing menu: The top-level page can turn off the feature for the whole hierarchy by using UTF-8, not having any UTF-8 errors, and declaring UTF-8, or, alternatively, by using the UTF-8 BOM (even if there are subsequent errors). The framee can turn off the feature for itself by using the UTF-8 BOM. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: C++ standards proposal for a embedding library
On Thu, Oct 24, 2019 at 12:30 PM Gijs Kruitbosch wrote: > From experience, people seriously underestimate how hard this is - > things like "I want a URL bar" or "I want tabs / multiple navigation > contexts and want them to interact correctly" or "users should be able > to download files and/or open them in helper apps cross-platform" are > considerably less trivial than most people seem to assume, and even as > Mozilla we have (perhaps embarrassingly) repeatedly made the same / > similar mistakes in these areas when creating new "embedders" from > scratch (Firefox for iOS, FirefoxOS, the various Android browsers), or > have had to go over all our existing separate gecko consumers to adjust > them to new web specs (most recent example I can think of is same site > cookies, for instance, which requires passing along origin-of-link > information for context menu or similar affordances), which is > non-trivial and of course cannot happen without embedding API > adjustments. The Mozilla cases are harder, because the applications we build around Web engines are Web browsers. My understanding (which may be wrong!) is that the purpose of the C++ proposal isn't to enable creating Web browsers around the API but to use the API to render the GUI for a local C++ app whose primary purpose isn't to browse the Web, so I assume "I want a URL bar" is the opposite of what the proposal is after. But even so, the proposal is inadequate in addressing questions like multiple windows and various issues related to loading content from the network as opposed to content from the app's own URL scheme that maps to a stream produced by the C++ app internals. And its inadequate for even the app-internal URL scheme: It looks like the app-internal URL scheme is expected to map a URL to a stream of bytes as opposed to a Content-Type and a stream of bytes. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: C++ standards proposal for a embedding library
On Tue, Oct 22, 2019 at 11:55 PM Botond Ballo wrote: > Given that, would anyone be interested in reviewing the proposed API > and providing feedback on its design? I feel like the committee would > be receptive to constructive technical feedback, and as a group with > experience in developing embedding APIs, we are in a particularly good > position to provide such feedback. > > The latest draft of the proposal can be found here: > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1108r4.html These comments are still more on the theme of this API being a bad idea for the standard library on the high level as opposed to being design comments on the specifics. Section 4.2 refers to Gecko's XPCOM embedding API, which hasn't been supported in quite a while. It also refers to the Trident embedding API. Putting aside for the moment the it's bad from Mozilla perspective to treat a Web engine as a generic platform capability as opposed to it being a thing that one chooses from a number of competitive options, it seems bad to build on the assumption that on Windows the platform capability is Trident, which isn't getting new Web features anymore. Further on the Trident point, it doesn't really work to both say that this is just exposing a platform capability and to go on to say that implementations are encouraged to support [list of Web specs]. This sort of thinking failed for the various Web on TV initiatives that got whatever Presto and WebKit had. If the mechanism here is the Trident embedding API on Windows, then you get whatever Trident has. I'm curious how developers of libstdc++ and libc++ view the notion of WebKitGTK+ as a platform capability. The C++ standard libraries are arguably for Linux while the "platform capability" named (WebKitGTK+) is arguably a Gnome thing rather than a Linux-level thing. That WPE WebKit exists separately from WebKitGTK+ seems like a data point of some kind: https://wpewebkit.org/ Section 4.4 argues against providing a localhost Web server capability. I understand that the goal here is a user experience that differs from the experience of launching a localhost HTTP server and telling a Web browser to navigate to a localhost URL. However, the additional arguments about the HTTP/2 and HTTP/3 landscape evolving quickly and requiring TLS seem bad. For localhost communication, HTTP/1.1 should address the kind of use cases presented (other than the launch UX). Without actual network latencies, HTTP/2 and HTTP/3 optimizations over HTTP/1.1 aren't _that_ relevant. Also, the point about TLS doesn't apply to HTTP/1.1 to localhost. The notion that it's easier to create a multi-engine Web engine embedding API that allows the embedder to feed the Web engine pseudonetwork data is simpler than creating a localhost HTTP/1.1 server seems like a huge misestimation of the relative complexities. Moreover, for either this API and a local HTTP server to work well, a better way of dynamically generating HTML than is presented in the example is required. It doesn't make sense to me to argue that everything belongs in the standard library to the point of putting a Web engine API there if a mechanism for generating the HTML to talk with the Web engine doesn't belong in the standard library. If the mechanism for generating the HTML is something you pull from GitHub instead of something you get in the standard library, why can't https://github.com/hfinkel/web_view be pulled from GitHub, too? Finally, this seems very hand-wavy in terms of the security aspects of loading remote content in a Web engine launched like this. My understanding is that this has been an area of security problems for Electron apps. It seems irresponsible not to cover this in detail, but it also seems potentially impossible to do so, given the need to work with whatever Web engine APIs that already exist as "platform capabilities". -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to prototype: Web Speech API
On Tue, Oct 15, 2019 at 2:56 AM Andre Natal wrote: > Regarding the UI, yes, the experience will be exactly the same in our case: > the user will get a prompt asking for permission to open the microphone (I've > attached a screenshot below [3]) ... > [3] > https://www.dropbox.com/s/fkyymiyryjjbix5/Screenshot%202019-10-14%2016.13.49.png?dl=0 Since the UI is the same as for getUserMedia(), is the permission bit that gets stored the same as for getUserMedia()? I.e. if a site obtains the permission for one, can it also use the other without another prompt? If a user understands how WebRTC works and what this piece of UI meant for WebRTC, this UI now represents a different trust decision on the level of principle. How intentional or incidental is it that this looks like a getUserMedia() use (audio goes to where the site named in the dialog decides to route it) instead of surfacing to the user that this is different (audio goes to where the browser vendor decides to route it)? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to ship: Web Speech API
On Sat, Oct 12, 2019 at 12:29 PM Andre Natal wrote: > We tried to capture everything here [1], so please if you don't see your > question addressed in this document, just give us a shout either here in > the thread or directly. ... > [1] > https://docs.google.com/document/d/1BE90kgbwE37fWoQ8vqnsQ3YMiJCKJSvqQwa463yCN1Y/edit?ts=5da0f63f# Thanks. It doesn't address the question of what the UI in Firefox is like. Following the links for experimenting with the UI on one's own leads to https://mdn.github.io/web-speech-api/speech-color-changer/ , which doesn't work in Nightly even with prefs flipped. (Trying that example in Chrome shows that Chrome presents the permission prompt as a matter of sharing the microphone with mdn.github.io as if this was WebRTC, which suggests that mdn.github.io decides where the audio goes. Chrome does not surface that, if I understand correctly how this API works in Chrome, the audio is instead sent to a destination of Chrome's choosing and not to a destination of mdn.github.io's choosing. The example didn't work for me in Safari.) -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Passing UniquePtr by value is more expensive than by rref
On Mon, Oct 14, 2019 at 9:05 AM Gerald Squelart wrote: > > I'm in the middle of watching Chandler Carruth's CppCon talk "There Are No > Zero-Cost Abstractions" and there's this interesting insight: > https://youtu.be/rHIkrotSwcc?t=1041 > > The spoiler is already in the title (sorry!), which is that passing > std::unique_ptr by value is more expensive than passing it by rvalue > reference, even with no exceptions! > > I wrote the same example using our own mozilla::UniquePtr, and got the same > result: https://godbolt.org/z/-FVMcV (by-value on the left, by-rref on the > right.) > So I certainly need to recalibrate my gutfeelometer. The discussion in the talk about what is needed to fix this strongly suggested (without uttering "Rust") that Rust might be getting this right. With panic=abort, Rust gets this right ( https://rust.godbolt.org/z/SZQaAS ) which really makes one appreciate both Rust-style move semantics and the explicitly not-committal ABI. (I had to put a side-effectful println! in bar to make sure a call to bar is generated, since #[inline(never)] isn't enough to prevent the compiler from eliding calls to functions it can see do nothing.) -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to ship: Web Speech API
On Mon, Oct 7, 2019 at 5:00 AM Marcos Caceres wrote: > - The updated implementation more closely aligns with Chrome's > implementation - meaning we get better interop across significant sites. What site can one try to get an idea of what the user interface is like? > - speech is processed in our cloud servers, not on device. What should one read to understand the issues that lead to this change? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Non-XPCOM in-RAM Unicode representation conversions have moved
Unicode representation conversions (and range queries like "is this in the ASCII range?") for data we already have inside the engine (i.e. we're not doing IO with an external source/destination) for the cases where the conversion doesn't need to be able to resize a target XPCOM string have moved. They are no longer in nsReadableUtils.h. They are now in mozilla/TextUtils.h, mozilla/Utf8.h, and mozilla/Latin1.h. As a result, the functions have moved to the mozilla:: namespace and now follow MfbtCase (IsAscii and IsUtf8 instead of the old IsASCII and IsUTF8). Support for external encodings and streaming continues to be in mozilla/Encoding.h. Conversions where the target is an XPCOM string remain in nsReadableUtils.h. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to unship: TLS 1.0 and TLS 1.1
On Fri, Sep 13, 2019 at 3:09 AM Martin Thomson wrote: > > On Thu, Sep 12, 2019 at 5:50 PM Henri Sivonen wrote: >> >> Do we know what the situation looks like for connections to RFC 1918 >> addresses? > > That's a hard one to even speculate about, and that's all we really have > there. Our telemetry doesn't really allow us to gain insight into that. I see. > The big question being enterprise uses, where there is some chance of having > names on servers in private address space. Most use of 1918 outside of > enterprise is likely still unsecured entirely. I was thinking of home printer, NAS and router config UIs that are unsecured in the sense of using self-signed certificates but that still use TLS, so that TLS matters for practical compatibility. I don't know of real examples of devices that both use TLS exclusively and don't support TLS 1.2. (My printer redirects http to https with self-signed cert but supports TLS 1.2.) -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to unship: TLS 1.0 and TLS 1.1
On Thu, Sep 12, 2019 at 7:03 AM Martin Thomson wrote: > Telemetry shows that TLS 1.0 usage is much higher > than we would ordinarily tolerate for this sort of deprecation Do we know what the situation looks like for connections to RFC 1918 addresses? > Finally, we will disable TLS 1.0 and 1.1 for all people using the Release > channel of Firefox in March 2020. Exact plans for how and when this will > happen are not yet settled. What expectations are there for being able to remove the code from NSS? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Proposed W3C Charter: Timed Text (TT) Working Group
On Thu, Aug 29, 2019 at 1:41 AM L. David Baron wrote: > > The W3C is proposing a revised charter for: > > Timed Text (TT) Working Group > https://www.w3.org/2019/08/ttwg-proposed-charter.html > https://lists.w3.org/Archives/Public/public-new-work/2019Aug/0004.html > > The comparison to the group's previous charter is: > > https://services.w3.org/htmldiff?doc1=https%3A%2F%2Fwww.w3.org%2F2018%2F05%2Ftimed-text-charter.html&doc2=https%3A%2F%2Fwww.w3.org%2F2019%2F08%2Fttwg-proposed-charter.html What should one read to understand what perceived need there is for further development on TTML and WebVTT? (That is, what's currently missing such that these aren't considered "done"?) -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: JS testing functions and compartments in mochitest-plain
On Mon, Aug 26, 2019 at 1:37 PM Jan de Mooij wrote: > > On Mon, Aug 26, 2019 at 12:25 PM Henri Sivonen wrote: >> >> Thanks. Since SpecialPowers doesn't exist in xpcshell tests, is there >> another way to reach JS testing functions from there? > > > I think just Cu.getJSTestingFunctions() should work. Thanks. This worked. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: JS testing functions and compartments in mochitest-plain
On Mon, Aug 26, 2019 at 11:27 AM Jan de Mooij wrote: > > On Mon, Aug 26, 2019 at 9:02 AM Henri Sivonen wrote: >> >> In what type of test does >> SpecialPowers.Cu.getJSTestingFunctions().newRope() actually return a >> rope within the calling compartment such that passing the rope to a >> WebIDL API really makes the rope enter the WebIDL bindings instead of >> getting intercepted by a cross-compartment wrapper first? > > > An xpcshell test or mochitest-chrome is probably easiest Thanks. Since SpecialPowers doesn't exist in xpcshell tests, is there another way to reach JS testing functions from there? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
JS testing functions and compartments in mochitest-plain
If in a plain mochitest I do var rope = SpecialPowers.Cu.getJSTestingFunctions().newRope(t.head, t.tail); var encoded = (new TextEncoder()).encode(rope); the encode() method doesn't see the rope. Instead, the call to encode() sees a linear string that was materialized by a copy in a cross-compartment wrapper. Does SpecialPowers always introduce a compartment boundary in a plain mochitest? In what type of test does SpecialPowers.Cu.getJSTestingFunctions().newRope() actually return a rope within the calling compartment such that passing the rope to a WebIDL API really makes the rope enter the WebIDL bindings instead of getting intercepted by a cross-compartment wrapper first? Alternatively: What kind of string lengths should I use with normal JS string concatenation to be sure that I get a rope instead of the right operand getting copied into an extensible left operand? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Structured bindings and minimum GCC & clang versions
On Fri, Aug 16, 2019 at 9:51 AM Eric Rahm wrote: > > We are actively working on this. Unfortunately, as expected, its never as > simple as we'd like. Updating the minimum gcc version ( > https://bugzilla.mozilla.org/show_bug.cgi?id=1536848) is blocked on getting > our hazard builds updated, updating to c++17 has some of it's own quirks. Thanks. The dependencies indeed look tricky. :-( I take it that doing what Chromium does and shipping a statically linked symbol-swapped copy of libc++ instead of depending on the system C++ standard library would have its own set of issues. :-( -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Structured bindings and minimum GCC & clang versions
This week, I wrote some code that made me wish we already had support for structured bindings and return by initializer list (both from C++17) for mozilla::Tuple. That is, if we have mozilla::Tuple Foo() it would be nice to be able to call it via auto [a, b] = Foo(); and within Foo to write returns as return { a, b }; It appears that our minimum GCC and minimum clang documented at https://developer.mozilla.org/en-US/docs/Mozilla/Using_CXX_in_Mozilla_code are pretty old. What's the current outlook for increasing the minimum GCC and clang versions such that we could start using structured bindings and return by initializer list for tuples (either by making sure mozilla::Tuple support these or by migrating from mozilla::Tuple to std::tuple) and thereby get ergonomic multiple return values in C++? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Removing --enable-shared-js [Was: Rust and --enable-shared-js]
On Tue, Aug 13, 2019 at 2:18 PM Lars Hansen wrote: > Cranelift should be genuinely optional until further notice; to my > knownledge, no near-term product work in Firefox or SpiderMonkey depends on > Cranelift. Cranelift is present in Nightly but (so far as I can tell) not in > Release. It can be disabled in the JS shell by configuring with > --disable-cranelift, and I just tested that this works. To the extent there > is other Rust code in SpiderMonkey it should not, so far as I know, depend on > the presence of Cranelift. It also seems to me that we should be able to use > Rust in SpiderMonkey independently of whether Cranelift is there, so if that > does not work it ought to be fixed. Thanks. That makes sense to me. The present state (now that https://bugzilla.mozilla.org/show_bug.cgi?id=1572364 has landed) is that when built as part of libxul, SpiderMonkey can use Rust code (jsrust_shared gets built) regardless of whether Cranelift is enabled. However, when SpiderMonkey is built outside libxul, SpiderMonkey can use Rust code (jsrust_shared gets built) only if Cranelift is enabled. I've filed https://bugzilla.mozilla.org/show_bug.cgi?id=1573098 to change that. (The actual addition of non-Cranelift Rust code of interest to jsrust_shared hasn't landed yet.) -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Non-header-only headers shared between SpiderMonkey and the rest of Gecko
On Tue, Aug 6, 2019 at 8:54 PM Kris Maglione wrote: > > On Tue, Aug 06, 2019 at 10:56:55AM +0300, Henri Sivonen wrote: > > Do we have some #ifdef for excluding parts of mfbt/ when mfbt/ is being used > > in a non-SpiderMonkey/Gecko context? > > #ifdef MOZ_HAS_MOZGLUE Thanks. This appears to be undefined for gtests that call in to libxul code. (Is that intentional?) It appears that MOZILLA_INTERNAL_API is needed for covering gtests that call into libxul code. As far as I can tell, after https://bugzilla.mozilla.org/show_bug.cgi?id=1572364 , ".cpp that links with jsrust_shared" is detected by: #if defined(MOZILLA_INTERNAL_API) || \ (defined(MOZ_HAS_MOZGLUE) && defined(ENABLE_WASM_CRANELIFT)) Does that look right? I verified this experimentally by checking that the above evaluates to false when compiling --enable-application=tools/update-packaging or --enable-application=memory. (Despite advice to the contrary, I still think it's important for discoverability to put my code in mfbt/TextUtils.h and having it disabled in contexts that don't link with jsrust_shared. It would be bad if e.g. mozilla::IsAscii taking a single char and mozilla::IsAscii taking mozilla::Span weren't discoverable in the same place. Or even worse if mozilla::IsAscii taking '\0'-terminated const char* and mozilla::IsAscii taking mozilla::Span weren't discoverable in the same place.) -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Removing --enable-shared-js [Was: Rust and --enable-shared-js]
On Tue, May 28, 2019 at 3:16 AM Mike Hommey wrote: > > On Tue, May 21, 2019 at 10:32:20PM -0400, Boris Zbarsky wrote: > > On 5/21/19 9:55 PM, Mike Hommey wrote: > > > Considering this has apparently been broken for so long, I guess nobody > > > will object to me removing the option for Gecko builds? > > > > It's probably fine, yeah... > > Now removed on autoland via bug 1554056. Thanks. It appears that building jsrust_shared is still conditional on ENABLE_WASM_CRANELIFT. How optional is ENABLE_WASM_CRANELIFT in practice these days? Is it genuinely optional for Firefox? Is it genuinely optional for standalone SpiderMonkey? If it is, are we OK with building without ENABLE_WASM_CRANELIFT having other non-Cranelift effects on SpiderMonkey performance (e.g. turning off SIMD for some operations) or on whether a particular string conversion is available in jsapi.h? I'm trying to understand the implication of Cranelift being optional for other Rust code in SpiderMonkey. I'd like to add Rust-backed SIMD-accelerated Latin1ness checking and UTF-8 validity checking to SpiderMonkey and Rust-backed conversion from JSString to UTF-8 in jsapi.h, and my understanding from All Hands was that adding these things would be OK, since SpiderMonkey already depends on Rust. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Non-header-only headers shared between SpiderMonkey and the rest of Gecko
On Tue, Aug 6, 2019 at 10:15 AM Henri Sivonen wrote: > In general, it seems problematic to organize headers based on whether > they have associated .cpp or crate code. I'd expect developers to look > for stuff under mfbt/ instead of some place else, since developers > using the header shouldn't have to know if the implementation is > header-only or not. :-( Notably, in my case the functions would logically belong in Utf8.h (which already has Utf8.cpp under mfbt/) and in TextUtils.h. What should my r+ expectations be if I put the entry points there despite the code requiring linking with Rust crates that SpiderMonkey (and, therefore, Gecko) depends on? Do we have some #ifdef for excluding parts of mfbt/ when mfbt/ is being used in a non-SpiderMonkey/Gecko context? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Non-header-only headers shared between SpiderMonkey and the rest of Gecko
On Mon, Aug 5, 2019 at 4:14 PM Gabriele Svelto wrote: > On 05/08/19 12:04, Henri Sivonen wrote: > > I has come to my attention that that putting non-header-only code > > under mfbt/ is something we're trying to get away from: > > https://bugzilla.mozilla.org/show_bug.cgi?id=1554062 > > > > Do we have an appropriate place for headers that declare entry points > > for non-header-only functionality (in my case, backed by Rust code) > > and that depend on types declared in headers that live under mfbt/ and > > that need to be available both to SpiderMonkey and the rest of Gecko? > > IIRC we have some stuff like that under mozglue/misc. The TimeStamp > class for example is used in both Gecko and SpiderMonky, has > platform-dependent C++ implementations (also linking to external > libraries) and uses MFBT headers. Is mozglue only for Gecko and SpiderMonkey and not anything else? (I.e. not crash reporter or anything else that doesn't link the Rust crates that SpiderMonkey links?) In general, it seems problematic to organize headers based on whether they have associated .cpp or crate code. I'd expect developers to look for stuff under mfbt/ instead of some place else, since developers using the header shouldn't have to know if the implementation is header-only or not. :-( -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: NotNull and pointer parameters that may or may not be null
On Mon, Jul 22, 2019 at 10:00 AM Karl Tomlinson wrote: > Google style requires pointers for parameters that may be mutated > by the callee, which provides that the potential mutation is > visible at the call site. Pointers to `const` types are > permitted, but recommended when "input is somehow treated > differently" [1], such as when a null value may be passed. > > Comments at function declarations should mention > "Whether any of the arguments can be a null pointer." [2] I understand that there's value in adopting Google rules without modification in order to avoid having to discuss which parts to adopt and to avoid having to adapt tooling in accordance to the results of such discussion. However, what should we think of Google disagreeing with the C++ Core Guidelines, which can be expected to also have larger ecosystem value as a tooling target and that in some sense could be considered to come from higher C++ authority than Google's rules? On this particular topic, the Core Guidelines have: https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rf-inout https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#f60-prefer-t-over-t-when-no-argument-is-a-valid-option https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#f23-use-a-not_nullt-to-indicate-that-null-is-not-a-valid-value (FWIW, I have relatively recently r+ed code that used non-const references to indicate non-nullable modifiable arguments on the grounds that it was endorsed by the Core Guidelines.) -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Non-header-only headers shared between SpiderMonkey and the rest of Gecko
I has come to my attention that that putting non-header-only code under mfbt/ is something we're trying to get away from: https://bugzilla.mozilla.org/show_bug.cgi?id=1554062 Do we have an appropriate place for headers that declare entry points for non-header-only functionality (in my case, backed by Rust code) and that depend on types declared in headers that live under mfbt/ and that need to be available both to SpiderMonkey and the rest of Gecko? (So far, shipping headers that depend on types that come from mfbt/ inside the related crates.io crate has been suggested, but it seems weird to ship Gecko-specific code via crates.io and Gecko developers probably aren't looking for mfbt type-aware C++ API headers under third_party/rust/.) -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Coding style 🙄 : `int` vs `intX_t` vs `unsigned/uintX_t`
On Fri, Jul 5, 2019 at 1:28 PM Nathan Froyd wrote: > > On Fri, Jul 5, 2019 at 2:48 AM Jeff Gilbert wrote: > > It is, however, super poignant to me that uint32_t-indexing-on-x64 is > > pessimal, as that's precisely what our ns* containers (nsTArray) use > > for size, /unlike/ their std::vector counterparts, which will be using > > the more-optimal size_t. > > nsTArray uses size_t for indexing since bug 1004098. We should probably endorse the use of size_t more explicitly in our guidelines. Apart from the issue of object size motivating especially strings using uint32_t for _fields_, it seems to me that a significant part of our uint32_t (originally PRUint32, of course) habit comes from the days when Tru64 Unix was the main 64-bit Gecko platform, therefore, we lacked proper 64-bit testing coverage. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Coding style 🙄 : `int` vs `intX_t` vs `unsigned/uintX_t`
On Thu, Jul 4, 2019 at 9:55 AM Boris Zbarsky wrote: > > never use any unsigned type unless you work with bitfields or need 2^N > > overflow (in particular, don't use unsigned for always-positive numbers, > > use signed and assertions instead). > > Do you happen to know why? Is this due to worries about underflow or > odd behavior on subtraction or something? I don't _know_, but most like they want to benefit from optimizations based on overflow being UB. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to unship:
On Fri, Jun 14, 2019 at 1:24 PM Jonathan Kingston wrote: > Most of the use cases are resolved by web crypto or u2f. Thanks for the removal. Do we have enterprise Web developer-facing documentation on 1) how TLS client cert enrollment should work now or 2) if there is no in-browser client cert enrollment path anymore, what concretely should be used instead? (To be clear: I'm not a fan of client certs, and I'm not requesting that there be an enrollment path.) -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Running C++ early in shutdown without an observer
On Fri, Jun 7, 2019 at 9:45 PM Chris Peterson wrote: > > On 6/7/2019 9:36 AM, Kris Maglione wrote: > > On Fri, Jun 07, 2019 at 09:18:38AM +0300, Henri Sivonen wrote: > >> For late shutdown cleanup, we have nsLayoutStatics::Shutdown(). Do we > >> have a similar method for running things as soon as we've decided that > >> the application is going to shut down? > >> > >> (I know there are observer topics, but I'm trying to avoid having to > >> create an observer object and to make sure that _it_ gets cleaned up > >> properly.) > > > > Observers are automatically cleaned up at XPCOM shutdown, so you > > generally don't need to worry too much about them. That said, > > nsIAsyncShutdown is really the way to go when possible. But it currently > > requires an unfortunate amount of boilerplate. Thanks. (nsIAsyncShutdown indeed looks like it involves a lot of boilerplate.) > Note that on Android, you may never get an opportunity for a clean > shutdown because the OS can kill your app at any time. My use case relates to Windows only. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Running C++ early in shutdown without an observer
For late shutdown cleanup, we have nsLayoutStatics::Shutdown(). Do we have a similar method for running things as soon as we've decided that the application is going to shut down? (I know there are observer topics, but I'm trying to avoid having to create an observer object and to make sure that _it_ gets cleaned up properly.) -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Remove browser and OS architecture from Firefox's User-Agent string?
On Tue, May 21, 2019 at 12:40 AM Randell Jesup wrote: > > >On Fri, May 10, 2019 at 11:40 PM Chris Peterson > >wrote: > >> I propose that Win64 and WOW64 use the unadorned Windows UA already used > >> by Firefox on x86 and AArch64 Windows: > >> > >> < "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 > >> Firefox/66.0" > >> > "Mozilla/5.0 (Windows NT 10.0; rv:66.0) Gecko/20100101 Firefox/66.0" > > > >Would there be significant downsides to hard-coding the Windows > >version to "10.0" in order to put Windows 7 and 8.x users in the same > >anonymity set with Windows 10 users? > > > >(We could still publish statistics of Windows version adoption at > >https://data.firefox.com/dashboard/hardware ) > > I wonder if any sites distributing windows executables might key off the > OS version to default to the correct exe for your version of windows? Are there known examples of apps like that? AFAICT, even Skype.com provides the Windows 7 -compatible .exe as the download even with a Windows 10 UA string, and you need to know to go to the Store if you want the Windows 10-only version. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Remove browser and OS architecture from Firefox's User-Agent string?
On Fri, May 10, 2019 at 11:40 PM Chris Peterson wrote: > I propose that Win64 and WOW64 use the unadorned Windows UA already used > by Firefox on x86 and AArch64 Windows: > > < "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 > Firefox/66.0" > > "Mozilla/5.0 (Windows NT 10.0; rv:66.0) Gecko/20100101 Firefox/66.0" Would there be significant downsides to hard-coding the Windows version to "10.0" in order to put Windows 7 and 8.x users in the same anonymity set with Windows 10 users? (We could still publish statistics of Windows version adoption at https://data.firefox.com/dashboard/hardware ) > And that Linux omit the OS architecture entirely (like Firefox on > Android or always spoof "i686" if an architecture token is needed for UA > parsing webcompat): Do we have any anecdata of the Web compat impact of not having anything between "Linux" and the next semicolon? Is there any evidence that " i686" would be a better single filler for everyone than " x86_64" if something is needed there for Web compat? Do we have indications if "Linux" is needed for Web compat? According to https://docs.google.com/spreadsheets/d/1I--o6uYWUkBw05IP964Ee2aZCf67P9E3TxpuDawH4_I/edit#gid=0 FreeBSD currently does not say "Linux". (Chrome on Chrome OS does not say Linux, either, but does say "X11; ".) That is, could "X11; " alone be sufficient for Web compat? (I'm happy to see that running Firefox in Wayland mode still says "X11; ". Let's keep it that way!) Do we have an idea if distros would counteract Mozilla and restore the CPU architecture if we removed it? Previous evidence suggests that distros are willing to split the anonymity set for self-promotional reasons by adding "; Ubuntu" or "; Fedora". Is there a similar distro interest in exposing the CPU architecture? https://docs.google.com/spreadsheets/d/1I--o6uYWUkBw05IP964Ee2aZCf67P9E3TxpuDawH4_I/edit#gid=0 suggests making Firefox on FreeBSD say "Linux". Are there indications that the self-promotion interests of FreeBSD wouldn't override privacy or Web compat benefits of saying "Linux"? > I propose no change to the macOS UA string at this time. Removing > "Intel" now would not reduce any fingerprinting entropy (all modern Macs > are x86_64) and might risk confusing some UA string parsers. If AArch64 > MacBooks become a real platform, I propose we then remove "Intel" so > x86_64 and AArch64 macOS would have the same UA string: > > < "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:66.0) Gecko/20100101 > Firefox/66.0" > > "Mozilla/5.0 (Macintosh; Mac OS X 10.14; rv:66.0) Gecko/20100101 > Firefox/66.0". Or they could have the same UA string by Aarch64 saying "Intel"... Meanwhile, could we make the system version number "10.14" (or whatever is latest at a given point in time) regardless of actual version number to put all macOS users in the same anonymity set? (Curiously, despite Apple's privacy efforts, Safari exposes the third component of the OS version number. Also, it uses underscores instead of periods as the separator.) > Here is a spreadsheet comparing UA strings of different browser and OS > architectures: > > https://docs.google.com/spreadsheets/d/1I--o6uYWUkBw05IP964Ee2aZCf67P9E3TxpuDawH4_I/edit#gid=0 The reference there to https://bugzilla.mozilla.org/show_bug.cgi?id=1169772 about exposing _some_ Android version number for Web compat says the reason not to make Firefox claim the same Android version for all users regardless of actual system version is that doing so would require bumping the version later: https://bugzilla.mozilla.org/show_bug.cgi?id=1169772#c36 It seems that for privacy reasons, we should claim the latest Android version for everyone even if it means introducing the recurring task of incrementing the number annually or so. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Personally opining that new C++ library facilities should not support non-UTF encodings
I sent this to SG16 (after tweaking it a bit to downplay UTF-16 and UTF-32). On Mon, Apr 22, 2019 at 1:19 PM Henri Sivonen wrote: > > (If you don't care about what I say about character encoding issues in > the context of C++ standardization outside my Mozilla activities, you > can save time by skipping the rest of this email.) > > On my own time outside my Mozilla activities, I was invited to read > and comment on the C++ standardization papers under the purview of the > Unicode Study Group (SG16) of the C++ committee. > > I've drafted a reply in which I opine that _new_ text processing > features (other than character encoding conversion facilities) in the > C++ standard library should only support UTF-8/16/32 and should not > seek to support non-UTF execution encodings and that conversion > facilities should have an API similar encoding_rs's API: > https://hsivonen.fi/p/non-unicode-in-cpp.html > > Based on prior advice from Botond, I'm sending this heads-up here just > in case: If you have a reason why it would be bad from the Mozilla > perspective for me to send the above-linked document to SG16 even with > the personal-capacity disclaimer, please let me know. > > (I expect this to be non-controversial from the Mozilla perspective, > since we already treat the concept of C++ "execution encoding" as a > C++ design bug that we route around.) > > -- > Henri Sivonen > hsivo...@hsivonen.fi > https://hsivonen.fi/ -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Personally opining that new C++ library facilities should not support non-UTF encodings
(If you don't care about what I say about character encoding issues in the context of C++ standardization outside my Mozilla activities, you can save time by skipping the rest of this email.) On my own time outside my Mozilla activities, I was invited to read and comment on the C++ standardization papers under the purview of the Unicode Study Group (SG16) of the C++ committee. I've drafted a reply in which I opine that _new_ text processing features (other than character encoding conversion facilities) in the C++ standard library should only support UTF-8/16/32 and should not seek to support non-UTF execution encodings and that conversion facilities should have an API similar encoding_rs's API: https://hsivonen.fi/p/non-unicode-in-cpp.html Based on prior advice from Botond, I'm sending this heads-up here just in case: If you have a reason why it would be bad from the Mozilla perspective for me to send the above-linked document to SG16 even with the personal-capacity disclaimer, please let me know. (I expect this to be non-controversial from the Mozilla perspective, since we already treat the concept of C++ "execution encoding" as a C++ design bug that we route around.) -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to deprecate - linux32 tests starting with Firefox 69
On Tue, Apr 9, 2019 at 6:05 PM Gian-Carlo Pascutto wrote: > On top of that, we know that not all distros have telemetry enabled and > so we won't be counting those either (Debian is the largest). We get telemetry from Ubuntu, Fedora, and Arch (subject to choices made by the user). I'm not particularly worried about Debian, because x86_64 (as amd64) has had prominent (unlike Ubuntu; see below) availability from Debian pretty much for as long as x86_64 hardware has been available. But in any case, not enabling telemetry means not having representation in data-driven decision making. > At least current Ubuntu and Ubuntu LTS are still available in 32-bit: > https://www.ubuntu.com/download/alternative-downloads > > Ubuntu is our largest userbase (with telemetry...) The latest and the latest LTS are offered for 32-bit x86 only as netboot installers, and the latest LTS also as an upgrade from an earlier version, so Ubuntu now positions 32-bit x86 as more obscure than s390x, ppc64el, aarch64, and armv7hf. Ubuntu has already disabled automatic updates to 18.10 for 32-bit x86, because the expectation is that 18.04 will be the longest-supported release for 32-bit x86. I think the main problem with Ubuntu is that Ubuntu promoted 32-bit x86 downloads to users who didn't have the expertise to seek the x86_64 version for way longer than they, in my opinion, should have (though, in fairness, at that time, Mozilla also was promoting 32-bit x86 downloads over 64-bit). Users who installed Ubuntu at that time may have gotten a 32-bit distro even if the hardware would work just fine with the 64-bit version, and there are no upgrades from 32-bit to 64-bit other than reinstall. Probably more Canonical's job than ours to communicate to those users that they should do a 64-bit reinstall. (In terms of security support from the Ubuntu repos, I find the situation for architectures that aren't tier-1 for us a bit concerning. Compare https://packages.ubuntu.com/search?suite=disco&searchon=names&keywords=firefox with https://packages.ubuntu.com/search?suite=cosmic&searchon=names&keywords=firefox . Despite 32-bit x86 having been demoted below s390x, ppc64el, aarch64, and armv7hf in terms of distribution images, 32-bit x86 gets better security support in the Ubuntu repos than s390x, ppc64el, aarch64, and armv7hf. So does security support in Ubuntu repos depend on our tier positioning or to Ubuntu's own positioning?) -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Proposed W3C Charters: Internationalization (i18n) Working Group and Interest Group
On Tue, Apr 9, 2019 at 11:13 PM L. David Baron wrote: > > On Tuesday 2019-04-09 13:55 +0300, Henri Sivonen wrote: > > On Mon, Apr 8, 2019 at 11:32 PM L. David Baron wrote: > > > > > > The W3C is proposing revised charters for: > > > > > > Internationalization (i18n) Working Group > > > https://www.w3.org/2019/04/proposed-i18n-wg-charter.html > > > https://lists.w3.org/Archives/Public/public-new-work/2019Apr/0004.html > > > diff from previous charter: > > > https://services.w3.org/htmldiff?doc1=https%3A%2F%2Fwww.w3.org%2FInternational%2Fcore%2Fcharter-2016.html&doc2=https%3A%2F%2Fwww.w3.org%2F2019%2F04%2Fproposed-i18n-wg-charter.html > > > > > > Internationalization (i18n) Interest Group > > > https://www.w3.org/2019/04/proposed-i18n-ig-charter.html > > > https://lists.w3.org/Archives/Public/public-new-work/2019Apr/0004.html > > > diff from previous charter: > > > https://services.w3.org/htmldiff?doc1=https%3A%2F%2Fwww.w3.org%2FInternational%2Fig%2Fcharter-2016.html&doc2=https%3A%2F%2Fwww.w3.org%2F2019%2F04%2Fproposed-i18n-ig-charter.html > > > > > > Mozilla has the opportunity to send comments or objections through > > > Friday, May 3. > > > > > > Please reply to this thread if you think there's something we should > > > say as part of this charter review, or if you think we should > > > support or oppose it. (Absent specific comments, I'd be inclined to > > > support the charters, because I think the i18n work at W3C has been > > > generally effective.) > > > > Is the WG expected to continue to issue revisions to > > https://www.w3.org/TR/charmod-norm/ ? The document itself suggests > > sending comments via GitHub issues. I don't see a charter item that > > clearly covers the maintenance of this document. Should we ask for an > > item that ensures that the group is explicitly chartered to continue > > to maintain this document? > > I expect the wording at the start of section 2.1 (Normative > Specifications) that says: > > The formal documents produced by the Working Group are guidelines, > best practices, requirements, and the like. These are best > published as Working Group Notes. > > probably covers this. Or does this document not fit within that > description? Oops. Sorry. Yes, those sentences cover it. Thanks. (I searched for the case-insensitive string "note", but somehow I managed to miss the second sentence you quoted.) -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Proposed W3C Charters: Internationalization (i18n) Working Group and Interest Group
On Mon, Apr 8, 2019 at 11:32 PM L. David Baron wrote: > > The W3C is proposing revised charters for: > > Internationalization (i18n) Working Group > https://www.w3.org/2019/04/proposed-i18n-wg-charter.html > https://lists.w3.org/Archives/Public/public-new-work/2019Apr/0004.html > diff from previous charter: > https://services.w3.org/htmldiff?doc1=https%3A%2F%2Fwww.w3.org%2FInternational%2Fcore%2Fcharter-2016.html&doc2=https%3A%2F%2Fwww.w3.org%2F2019%2F04%2Fproposed-i18n-wg-charter.html > > Internationalization (i18n) Interest Group > https://www.w3.org/2019/04/proposed-i18n-ig-charter.html > https://lists.w3.org/Archives/Public/public-new-work/2019Apr/0004.html > diff from previous charter: > https://services.w3.org/htmldiff?doc1=https%3A%2F%2Fwww.w3.org%2FInternational%2Fig%2Fcharter-2016.html&doc2=https%3A%2F%2Fwww.w3.org%2F2019%2F04%2Fproposed-i18n-ig-charter.html > > Mozilla has the opportunity to send comments or objections through > Friday, May 3. > > Please reply to this thread if you think there's something we should > say as part of this charter review, or if you think we should > support or oppose it. (Absent specific comments, I'd be inclined to > support the charters, because I think the i18n work at W3C has been > generally effective.) Is the WG expected to continue to issue revisions to https://www.w3.org/TR/charmod-norm/ ? The document itself suggests sending comments via GitHub issues. I don't see a charter item that clearly covers the maintenance of this document. Should we ask for an item that ensures that the group is explicitly chartered to continue to maintain this document? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Proposed W3C Charter: Web & Networks Interest Group
On Mon, Apr 8, 2019 at 11:11 PM L. David Baron wrote: > > The W3C is proposing a new charter for: > > Web & Networks Interest Group > https://www.w3.org/2019/03/web-networks-charter-draft.html > https://lists.w3.org/Archives/Public/public-new-work/2019Mar/0010.html > > Mozilla has the opportunity to send comments or objections through > Friday, April 26. > > Please reply to this thread if you think there's something we should > say as part of this charter review, or if you think we should > support or oppose it. The phrasing of "Application hints to the network" part of the charter suggest that the IG envisions the browser declaring preferences to the *network* rather than the other end point of the connection. Am I reading that part right? That seems contrary to the general trend, including Mozilla efforts, to encrypt things so that things aren't visible to the network between the end points and the tendency to consider it unwanted for the network to take actions other than making the packets travel between the end points. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to implement and experiment: Require user interaction for notification permission prompts
On Tue, Mar 19, 2019 at 3:15 PM Johann Hofmann wrote: > > In bug 1524619 <https://bugzilla.mozilla.org/show_bug.cgi?id=1524619> I > plan to implement support for requiring a user gesture when calling > Notification.requestPermission() [0] and PushManager.subscribe() [1]. What's the current status of getting a cross-browser definition for something being invoked in response to a user gesture? Does scrolling count as a user gesture? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent-to-Ship: Backward-Compatibility FIDO U2F support for Google Accounts
On Thu, Mar 14, 2019 at 8:12 PM J.C. Jones wrote: > It appears that if we want full security key support for Google > Accounts in Firefox in the near term, we need to graduate our FIDO U2F > API support from “experimental and behind a pref” I think it's problematic to describe something as "experimental" if it's not on path to getting enabled. "Experimental and behind a pref" sounds like it's on track to getting enabled, so simultaneously 1) sites have a reason to believe they don't need to do anything for Firefox, since for now users can flip a pref and the feature is coming anyway and 2) still the feature doesn't actually work by default for users, and, considering the penalty of using an experimental feature where the experiment fails is getting locked out of an account for this particular feature. So I think it's especially important to move *somewhere* from the "experimental and behind a pref" state: Either to interop with Chrome to the extent required by actual sites (regardless of what's de jure standard) or to clear removal so that the feature doesn't look like sites should just wait for it to get enabled and that the sites expect the user to flip a pref. As a user, I'd prefer the "interop with Chrome" option. > to either “enabled > by default” or “enabled for specific domains by default.” I am > proposing the latter. Why not the former? Won't the latter still make other sites wait in the hope that if they don't change, they'll get onto the list eventually anyway? > First, we only implemented the optional Javascript version of the API, > not the required MessagePort implementation [3]. This is mostly > semantics, because everyone actually uses the JS API via a > Google-supplied polyfill called u2f-api.js. Do I understand correctly that the part that is actually needed for interop is implemented? > As I’ve tried to establish, I’ve had reasons to resist shipping the > FIDO U2F API in Firefox, and I believe those reasons to be valid. > However, a multi-year delay for the largest security key-enabled web > property is, I think, unreasonable to push upon our users. We should > do what’s necessary to enable full security key support on Google > Accounts as quickly as is practical. This concern seems to apply to other services as well. > I’ve proposed here making the FIDO U2F API whitelist a pref. I can’t > say whether I would welcome adding more domains to it by default; I > think we’re going to have to take them on a case-by-case basis. What user-relevant problem is solved by having to add domains to a list compared to making the feature available to all domains? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Type-based alias analysis and Gecko C++
On Wed, Feb 27, 2019 at 10:20 AM Henri Sivonen wrote: > Given the replies to this thread and especially the one I quoted > above, I suggest appending the following paragraph after the first > paragraph of > https://developer.mozilla.org/en-US/docs/Mozilla/Using_CXX_in_Mozilla_code I've made the edit after checking with Ehsan. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: What is dom/browser-element/ used for?
I think I found a user in dev tools: https://searchfox.org/mozilla-central/rev/2a6f3dde00801374d3b2a704232de54a132af389/devtools/client/responsive.html/components/Browser.js#140 On Thu, Feb 28, 2019 at 11:45 AM Henri Sivonen wrote: > > It appears dom/browser-element/ was created for Gaia. Is it used for > something still? WebExtensions perhaps? > > -- > Henri Sivonen > hsivo...@mozilla.com -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
What is dom/browser-element/ used for?
It appears dom/browser-element/ was created for Gaia. Is it used for something still? WebExtensions perhaps? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Type-based alias analysis and Gecko C++
On Tue, Feb 19, 2019 at 10:17 PM Gabriele Svelto wrote: > On the reverse I've seen performance regressions from using > -fno-strict-aliasing only in tight loops where the inability to move > accesses around was lengthening the critical path through the loop. > However this was on processors with very limited memory reordering > capabilities; my guess is that on today's hardware > -fno-strict-aliasing's impact is lost in the noise. Given the replies to this thread and especially the one I quoted above, I suggest appending the following paragraph after the first paragraph of https://developer.mozilla.org/en-US/docs/Mozilla/Using_CXX_in_Mozilla_code : On the side of extending C++, we compile with -fno-strict-aliasing. This means that when reinterpreting a pointer as a differently-typed pointer, you don't need to adhere to the "effective type" (of the pointee) rule from the standard when dereferencing the reinterpreted pointer. You still need make sure that you don't violate alignment requirements and need to make sure that the data at the memory location pointed to forms a valid value when interpreted according to the type of the pointer when dereferencing the pointer for reading. Likewise, if you write by dereferencing the reinterpreted pointer and the originally-typed pointer might still be dereferenced for reading, you need to make sure that the values you write are valid according to the original type. This issue is moot for e.g. primitive integers for which all bit patterns of their size are valid value. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Type-based alias analysis and Gecko C++
On Fri, Feb 22, 2019 at 1:00 AM Jeff Walden wrote: > > On 2/17/19 11:40 PM, Henri Sivonen wrote: > > Rust, which combines the > > perf benefits of -fstrict-aliasing with the understandability of > > -fno-strict-aliasing? > > This is not really true of Rust. Rust's memory model is not really defined > yet https://doc.rust-lang.org/reference/memory-model.html but from what I've > been able to read as to how you're "supposed" to and "should" use the > language in unsafe code and through FFI, it *does* require the same sorts of > things as C++ in "you can't dereference a pointer/reference unless it > contains a well-formed value of the type of the pointer/reference". Just, > Rust has somewhat more tools that hide away this unsafety so you don't often > manually bash on memory yourself in that manner. Requiring a dereferenced pointer to point to a value that is well-formed according to the type of the pointer is *very* different from having requirements on how the value was written ("effective type"). E.g. all possible bit patterns of f64 are well-formed bit patterns for u64, so in Rust it's permissible to use a u64-typed pointer to access a value that was created as f64. However, in C++ (without -fno-strict-aliasing, of course), if the "effective type" of a pointee is double, i.e. it was written as double, it's not permissible to access the value via a uint64_t-type pointer. In fact, the Rust standard library even provides an API for such viewing: https://doc.rust-lang.org/std/primitive.slice.html#method.align_to The unsafety remark is not in terms of aliasing but in terms of *value* transmutability. The method is fully safe when U is a type for which all bit patterns of U's size are valid values. (I'm a bit disappointed that there isn't a safe method to that effect with a trait bound to a trait that says that all bit patterns are valid. Then primitive integers and SIMD vectors of integer lanes could implement that marker trait.) > As a practical matter, I don't honestly see how Rust can avoid having a > memory model very similar to C++'s, including with respect to aliasing, even > if they're not there *formally* yet. As far as I'm aware, Ralf Jung, who is working on the formalization, is against introducing *type-based* alias analysis to Rust. Unsafe Rust has informal and will have formal aliasing rules, but all indications are that they won't be *type-based*. See https://www.ralfj.de/blog/2018/08/07/stacked-borrows.html https://www.ralfj.de/blog/2018/11/16/stacked-borrows-implementation.html https://www.ralfj.de/blog/2018/12/26/stacked-borrows-barriers.html for the aliasing rule formulation that does not involve type-based alias analysis. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Type-based alias analysis and Gecko C++
On Fri, Feb 15, 2019 at 6:47 PM Ted Mielczarek wrote: > > On Fri, Feb 15, 2019, at 4:00 AM, Henri Sivonen wrote: > > How committed are we to -fno-strict-aliasing? > > FWIW, some work was done to investigate re-enabling strict aliasing a while > ago but it proved untenable at the time: > https://bugzilla.mozilla.org/show_bug.cgi?id=414641 The bug was closed with "Realistically, this is WONTFIX. Life is too short to figure out why -O3 breaks -fstrict-aliasing." That conclusion makes sense to me. Is there any reason to believe that strict-aliasing in clang would yield the kind of performance benefits that would outweigh the trouble of writing strict-aliasing-conformant code and the performance penalties of additional memcpy() required for strict-aliasing-conformant code? Out of curiosity: Do we know if WebKit and Chromium compile with or without strict-aliasing? On Fri, Feb 15, 2019 at 4:43 PM David Major wrote: > If we can easily remove (or reduce) uses of this flag, I think > that would be pretty uncontroversial. What are the UB implications of using it for some parts of the code but not for others in the context of LTO? If we have specific places where we'd need strict-aliasing for performance, shouldn't we write those bits in Rust, which combines the perf benefits of -fstrict-aliasing with the understandability of -fno-strict-aliasing? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Type-based alias analysis and Gecko C++
I happened to have a reason to run our build system under strace, and I noticed that we pass -fno-strict-aliasing to clang. How committed are we to -fno-strict-aliasing? If we have no intention of getting rid of -fno-strict-aliasing, it would make sense to document this at https://developer.mozilla.org/en-US/docs/Mozilla/Using_CXX_in_Mozilla_code and make it explicitly OK for Gecko developers not to worry about type-based alias analysis UB--just like we don't worry about writing exception-safe code. Debating in design/review or making an effort to avoid type-based alias analysis UB is not a good use of time if we're committed to not having type-based alias analysis. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Proposed W3C Charter: SVG Working Group
On Wed, Jan 23, 2019 at 11:23 AM Cameron McCormack wrote: > > On Thu, Jan 10, 2019, at 12:38 AM, Henri Sivonen wrote: > > A (non-changed) part of the charter says under SVG2: "This > > specification updates SVG 1.1 to include HTML5-compatible parsing". Is > > that in reference to > > https://svgwg.org/svg2-draft/single-page.html#embedded-HTMLElements or > > something else? I.e. does it mean the SVG WG wants to change the HTML > > parsing algorithm to put and with > > non-integration-point SVG parent into the HTML namespace in the HTML > > parser? > > I see the note in that section you link that says: > > > Currently, within an SVG subtree, these tagnames are not recognized by the > > HTML parser to > > be HTML-namespaced elements, although this may change in the future. > > Therefore, in order > > to include these elements within SVG, one of the following must be used: > > ... > > The "this may change in the future" part sounds like someone thought that it > might be the case in the future. Saying that SVG 2 "includes > HTML5-compatible parsing" is a bit odd, though, since that behavior is > defined in the HTML spec. In any case, given the group's intended focus on > stabilizing and documenting what is currently implemented and interoperable, > I doubt that making such a change would be in scope. Thanks. I think it would be prudent for Mozilla to request that " updates SVG 1.1 to include HTML5-compatible parsing," be struck from the charter, so that changes to the HTML parsing algorithm can't be justified using an argument from a charter that we approved. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to ship: TextEncoder.encodeInto() - UTF-8 encode into caller-provided buffer
On Mon, Jan 14, 2019 at 4:45 PM Boris Zbarsky wrote: > On 1/14/19 4:28 AM, Henri Sivonen wrote: > > This is now an "intent to ship". The feature landed in the spec, in > > WPT and in Gecko (targeting 66 release). > > Where do other browsers stand on this feature, do you know? I see active involvement in spec review and in test case reviewing and test validation using a polyfill from a Chromium developer and the bug (https://bugs.chromium.org/p/chromium/issues/detail?id=920107) indicates interest from another person to do the implementation. No signals from WebKit. https://bugs.webkit.org/show_bug.cgi?id=193274 -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Intent to ship: TextEncoder.encodeInto() - UTF-8 encode into caller-provided buffer
This is now an "intent to ship". The feature landed in the spec, in WPT and in Gecko (targeting 66 release). On Mon, Dec 17, 2018 at 9:12 AM Henri Sivonen wrote: > > # Summary > > TextEncoder.encodeInto() adds encoding JavaScript strings into UTF-8 > into caller-provided byte buffers. This is a performance optimization > for JavaScript/Wasm interop that allows the encoder output to be > written directly into Wasm memory without extra copies. > > # Details > > TextEncoder.encode() returns a DOM implementation-created buffer. This > involves coping internally in the implementation (in principle, this > copy could be optimized away with internal-only API changes) and then > yet another copy from the returned buffer into Wasm memory (optimizing > away this copy needs a Web-visible change anyway). > > TextEncoder.encodeInto() avoids both copies. > > https://bugzilla.mozilla.org/show_bug.cgi?id=1449849 combined with > TextEncoder.encodeInto() is expected to avoid yet more copying. > > The expectation is that passing strings from JS to Wasm is / is going > to be common enough to be worthwhile to optimize. > > # Bug > > https://bugzilla.mozilla.org/show_bug.cgi?id=1514664 > > # Link to standard > > https://github.com/whatwg/encoding/pull/166 > > # Platform coverage > > All > > # Estimated or target release > > 67 > > # Preference behind which this will be implemented > > Not planning to have a pref for this. > > # Is this feature enabled by default in sandboxed iframes? > > Yes. > > # DevTools bug > > No need for new DevTools integration. > > # Do other browser engines implement this > > No, but Chrome developers have been active in the spec discussion. > > # web-platform-tests > > https://github.com/web-platform-tests/wpt/pull/14505 > > # Is this feature restricted to secure contexts? > > No. This is a new method on an interface that predates restricting > features to secure contexts. > > -- > Henri Sivonen > hsivo...@mozilla.com -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Proposed W3C Charter: SVG Working Group
On Sun, Dec 23, 2018 at 7:59 PM L. David Baron wrote: > > The W3C is proposing a revised charter for: > > Scalable Vector Graphics (SVG) Working Group > https://www.w3.org/Graphics/SVG/svg-2019-ac.html > https://lists.w3.org/Archives/Public/public-new-work/2018Dec/0006.html (Not a charter comment yet. At this point a question.) A (non-changed) part of the charter says under SVG2: "This specification updates SVG 1.1 to include HTML5-compatible parsing". Is that in reference to https://svgwg.org/svg2-draft/single-page.html#embedded-HTMLElements or something else? I.e. does it mean the SVG WG wants to change the HTML parsing algorithm to put and with non-integration-point SVG parent into the HTML namespace in the HTML parser? (Even with evergreen browsers, changing the HTML parsing algorithm poses the problem that, if the algorithm is ever-changing, server-side software cannot make proper security decisions on the assumption that their implementation of the HTML parsing algorithm from some point in time matches the behavior of browsers. ) -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Rust code coverage
On Fri, Jan 4, 2019 at 2:54 PM Marco Castelluccio wrote: > Hi everyone, > we have recently enabled collecting code coverage for Rust code too, Nice! > running Rust tests in coverage builds. Does this mean running cargo test for each crate under third_party/rust, running Firefox test suites or both? As for trying to make sense of what the numbers mean: Is the coverage ratio reported on lines attributed at all in ELF as opposed to looking at the number of lines in the source files? What kind of expectations one should have on how the system measures coverage for code that gets inlined? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Pointer to the stack limit
On Wed, Dec 19, 2018 at 2:37 PM David Major wrote: > You'll need platform-specific code, but on Windows there's > https://searchfox.org/mozilla-central/rev/13788edbabb04d004e4a1ceff41d4de68a8320a2/js/xpconnect/src/XPCJSContext.cpp#986. > > And, to get a sense of caution, have a look at the ifdef madness surrounding > the caller -- > https://searchfox.org/mozilla-central/rev/13788edbabb04d004e4a1ceff41d4de68a8320a2/js/xpconnect/src/XPCJSContext.cpp#1125 > -- to see the number of hoops we have to jump through to accommodate various > build configs. Thanks. It looks like the Android case just hard-codes a limit that works for Dalvik instead of querying from the OS or ever querying for the API level to decide between a limit that works for Dalvik and a limit that works for ART. (On IRC, I was pointed to the code that uses the limit: https://searchfox.org/mozilla-central/search?q=symbol:_ZN2js19CheckRecursionLimitEP9JSContext&redirect=false ) -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Pointer to the stack limit
Is it possible to dynamically at run-time obtain a pointer to call stack limit? I mean the address that is the lowest address that the run-time stack can grow into without the process getting terminated with a stack overflow. I'm particularly interested in a solution that'd work on 32-bit Windows and on Dalvik. (On ART, desktop Linux, and 64-bit platforms we can make the stack "large enough" anyway.) Use case: Implementing a dynamic recursion limit. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Intent to implement: TextEncoder.encodeInto() - UTF-8 encode into caller-provided buffer
# Summary TextEncoder.encodeInto() adds encoding JavaScript strings into UTF-8 into caller-provided byte buffers. This is a performance optimization for JavaScript/Wasm interop that allows the encoder output to be written directly into Wasm memory without extra copies. # Details TextEncoder.encode() returns a DOM implementation-created buffer. This involves coping internally in the implementation (in principle, this copy could be optimized away with internal-only API changes) and then yet another copy from the returned buffer into Wasm memory (optimizing away this copy needs a Web-visible change anyway). TextEncoder.encodeInto() avoids both copies. https://bugzilla.mozilla.org/show_bug.cgi?id=1449849 combined with TextEncoder.encodeInto() is expected to avoid yet more copying. The expectation is that passing strings from JS to Wasm is / is going to be common enough to be worthwhile to optimize. # Bug https://bugzilla.mozilla.org/show_bug.cgi?id=1514664 # Link to standard https://github.com/whatwg/encoding/pull/166 # Platform coverage All # Estimated or target release 67 # Preference behind which this will be implemented Not planning to have a pref for this. # Is this feature enabled by default in sandboxed iframes? Yes. # DevTools bug No need for new DevTools integration. # Do other browser engines implement this No, but Chrome developers have been active in the spec discussion. # web-platform-tests https://github.com/web-platform-tests/wpt/pull/14505 # Is this feature restricted to secure contexts? No. This is a new method on an interface that predates restricting features to secure contexts. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to implement and ship: UTF-8 autodetection for HTML and plain text loaded from file: URLs
On Tue, Dec 11, 2018 at 10:08 AM Henri Sivonen wrote: > How about I change it to 5 MB on the assumption that that's still very > large relative to pre-UTF-8-era HTML and text file sizes? I changed the limit to 4 MB. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to implement and ship: UTF-8 autodetection for HTML and plain text loaded from file: URLs
On Tue, Dec 11, 2018 at 2:24 AM Martin Thomson wrote: > This seems reasonable, but 50M is a pretty large number. Given the > odds of UTF-8 detection failing, I would have thought that this could > be much lower. Consider the case of a document of ASCII text with a copyright sign in the footer. I'd rather not make anyone puzzle over why the behavior of the footer depends on how much text comes before the footer. 50 MB is intentionally extremely large relative to "normal" HTML and text files so that the limit is reached approximately "never" unless you open *huge* log files. The HTML spec is about 11 MB these days, so that's existence proof that a non-log-file HTML document can exceed 10 MB. Of course, the limit doesn't need to be larger than present-day UTF-8 files but larger than "normal"-sized *legacy* non-UTF-8 files. It is quite possible that 50 MB is *too* large considering 32-bit systems and what *other* allocations are proportional to the buffer size, and I'm open to changing the limit to something smaller than 50 MB as long as it's still larger than "normal" non-UTF-8 HTML and text files. How about I change it to 5 MB on the assumption that that's still very large relative to pre-UTF-8-era HTML and text file sizes? > What is the number in Chrome? It depends. It's unclear to me what exactly it depends on. Based on https://github.com/whatwg/encoding/issues/68#issuecomment-272993181 , I expect it to depend on some combination of file system, OS kernel and Chromium IO library internals. On Ubuntu 18.04 with ext4 on an SSD, the number is 64 KB. On Windows 10 1803 with NTFS on an SSD, it's something smaller. I think making the limit depend on the internals of file IO buffering instead of a constant in the HTML parser is a really bad idea. Also 64 KB or something less than 64 KB seem way too small for the purpose of making it so that the user approximately never needs to puzzle over why things are different based on the length of the ASCII prefix of a file with non-ASCII later in the file. > I assume that other local sources like chrome: are expected to be > annotated properly. >From source inspection, it seems that chrome: URLs already get hard-coded to UTF-8 on the channel level: https://searchfox.org/mozilla-central/source/chrome/nsChromeProtocolHandler.cpp#187 As part of developing the patch, I saw only resource: URLs showing up as file: URLs to the HTML parser, so only resource: URLs got a special check that fast-tracks them to UTF-8 instead of buffering for detection like normal file: URLs. > On Mon, Dec 10, 2018 at 11:28 PM Henri Sivonen wrote: > > > > (Note: This isn't really a Web-exposed feature, but this is a Web > > developer-exposed feature.) > > > > # Summary > > > > Autodetect UTF-8 when loading HTML or plain text from file: URLs (only!). > > > > Some Web developers like to develop locally from file: URLs (as > > opposed to local HTTP server) and then deploy using a Web server that > > declares charset=UTF-8. To get the same convenience as when developing > > with Chrome, they want the files loaded from file: URLs be treated as > > UTF-8 even though the HTTP header isn't there. > > > > Non-developer users save files from the Web verbatim without the HTTP > > headers and open the files from file: URLs. These days, those files > > are most often in UTF-8 and lack the BOM, and sometimes they lack > > , and plain text files can't even use > charset=utf-8>. These users, too, would like a Chrome-like convenience > > when opening these files from file: URLs in Firefox. > > > > # Details > > > > If a HTML or plain text file loaded from a file: URL does not contain > > a UTF-8 error in the first 50 MB, assume it is UTF-8. (It is extremely > > improbable for text intended to be in a non-UTF-8 encoding to look > > like valid UTF-8 on the byte level.) Otherwise, behave like at > > present: assume the fallback legacy encoding, whose default depends on > > the Firefox UI locale. > > > > The 50 MB limit exists to avoid buffering everything when loading a > > log file whose size is on the order of a gigabyte. 50 MB is an > > arbitrary size that is significantly larger than "normal" HTML or text > > files, so that "normal"-sized files are examined with 100% confidence > > (i.e. the whole file is examined) but can be assumed to fit in RAM > > even on computers that only have a couple of gigabytes of RAM. > > > > The limit, despite being arbitrary, is checked exactly to avoid > > visible behavior changes depending on how Necko chooses buffer > > boundaries. > > > > The limit is a number of bytes instead of
Intent to implement and ship: UTF-8 autodetection for HTML and plain text loaded from file: URLs
ental rendering. (Not acceptable for remote content for user-perceived performance reasons.) This is what the solution for file: URLs does on the assumption that it's OK, because the data in its entirety is (approximately) immediately available. * Causing reloads. This is the mode of badness that applies when our Japanese detector is in use and the first 1024 aren't enough to make the decision. All of these are bad. It's better to make the failure to declare UTF-8 in the http/https case something that the Web developer obviously has to fix (by adding , HTTP header or the BOM) than to make it appear that things work when actually at least one of the above forms of badness applies. # Bug https://bugzilla.mozilla.org/show_bug.cgi?id=1071816 # Link to standard https://html.spec.whatwg.org/#determining-the-character-encoding step 7 is basically an "anything goes" step for legacy reasons--mainly to allow Japanese encoding detection that IE, WebKit and Gecko had before the spec was written. Chrome started detecting more without prior standard-setting discussion. See https://github.com/whatwg/encoding/issues/68 for after-the-fact discussion. # Platform coverage All # Estimated or target release 66 # Preference behind which this will be implemented Not planning to have a pref for this. # Is this feature enabled by default in sandboxed iframes? This is implemented to apply to all non-resource:-URL-derived file: URLs, but since same-origin inheritance to child frames takes precedence, this isn't expected to apply to sandboxed iframes in practice. # DevTools bug No new dev tools integration. The pre-existing console warning about undeclared character encoding will be shown still in the autodetection case. # Do other browser engines implement this Chrome does, but not with the same number of bytes examined. Safari as of El Capitan (my Mac is stuck on El Capitan) doesn't. Edge as of Windows 10 1803 doesn't. # web-platform-tests As far as I'm aware, WPT doesn't cover file: URL behavior, and there isn't a proper spec for this. Hence, unit tests use mochitest-chrome. # Is this feature restricted to secure contexts? Restricted to file: URLs. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Checking if an nsIURI came from a resource: URL
On Fri, Dec 7, 2018 at 5:05 PM Dave Townsend wrote: > > This suggests that channel.originalURI should help: > https://searchfox.org/mozilla-central/source/netwerk/base/nsIChannel.idl#37 Indeed, getting both nsIURIs from the channel works. Thanks! -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Checking if an nsIURI came from a resource: URL
On Fri, Dec 7, 2018 at 3:23 PM Daniel Veditz wrote: > > I'm afraid to ask why you want to treat these differently. I'd like to make out resource: URLs default to UTF-8 and skip (upcoming) detection between UTF-8 and the locale-affiliated legacy encoding. > Do you have a channel or a principal? I have a channel at a later point, so I could reverse the decision made from the nsIURI by looking at the channel before the initial decision is acted upon in a meaningful way. -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Checking if an nsIURI came from a resource: URL
It appears that my the time resource: URLs reach the HTML parser, their scheme is reported as "file" (at least in debug builds). Is there a way to tell from an nsIURI that it was expanded from a resource: URL? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Rust and --enable-shared-js
On Mon, Sep 24, 2018 at 3:24 PM, Boris Zbarsky wrote: > On 9/24/18 4:04 AM, Henri Sivonen wrote: >> >> How important is --enable-shared-js? I gather its use case is making >> builds faster for SpiderMonkey developers. > > > My use case for it is to be able to use the "exclude samples from library X" > or "collapse library X" tools in profilers (like Instruments) to more easily > break down profiles into "page JS" and "Gecko things". OK. On Mon, Sep 24, 2018 at 1:24 PM, Mike Hommey wrote: >> How important is --enable-shared-js? I gather its use case is making >> builds faster for SpiderMonkey developers. Is that the only use case? > > for _Gecko_ developers. This surprises me. Doesn't the build system take care of not rebuilding SpiderMonkey if it hasn't been edited? Is this only about the link time? What's the conclusion regarding next steps? Should I introduce js_-prefixed copies of the four Rust FFI functions that I want to make available to SpiderMonkey? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Rust and --enable-shared-js
There's an effort to add Rust code to SpiderMonkey: https://bugzilla.mozilla.org/show_bug.cgi?id=1490948 This will introduce a jsrust_shared crate that will just depend on all the Rust crates that SpiderMonkey needs like gkrust_shared depends on the crates the rest of Gecko needs. This is fine both for building standalone SpiderMonkey (a top-level jsrust will produce a .a and depend on jsrust_shared) and SpiderMonkey as part of libxul (gkrust_shared will depend on jsrust_shared). However, there exists a third configuration: --enable-shared-js. With this option, SpiderMonkey is linked dynamically instead of being baked into libxul. This is fine as long the set of FFI-exposing crates that SpiderMonkey depends on and the set of FFI-exposing crates that the rest of Gecko depends on are disjoint. If they aren't disjoint, a symbol conflict is expected. AFAICT, this could be solved in at least three ways: 1) Keeping the sets disjoint. If both SpiderMonkey and the rest of Gecko want to call the same Rust code, introduce a differently-named FFI binding for SpiderMonkey. 2) Making FFI symbols .so-internal so that they don't conflict between shared libraries. Per https://bugzilla.mozilla.org/show_bug.cgi?id=1490603 , it seems that this would require rustc changes that don't exist yet. 3) Dropping support for --enable-shared-js For my immediate use case, I want to make 4 functions available both to SpiderMonkey and the rest of Gecko, so option #1 is feasible, but it won't scale. Maybe #2 becomes feasible before scaling up #1 becomes a problem. But still, I'm curious: How important is --enable-shared-js? I gather its use case is making builds faster for SpiderMonkey developers. Is that the only use case? Is it being used that way in practice? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Extending the length of an XPCOM string when writing to it via a raw pointer
On Fri, Aug 31, 2018 at 8:43 AM, Henri Sivonen wrote: > At this point, it's probably relevant to mention that SetCapacity() in > situations other that ahead of a sequence of Append()s is most likely > wrong (and has been so since at least 2004; I didn't bother doing code > archeology further back than that). I wrote some SetCapacity() docs at: https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Guide/Internal_strings#Sequence_of_appends_without_reallocating (I caused SetLength, AppendUTF16toUTF8 and AppendUTF8toUTF16 to move from the first list to the second, but, other than that, XPCOM strings have been this way long before I took a look at this code and I'm just the messenger here.) Also filed a static analysis request for this: https://bugzilla.mozilla.org/show_bug.cgi?id=1487612 -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Extending the length of an XPCOM string when writing to it via a raw pointer
On Thu, Aug 30, 2018 at 7:43 PM, Henri Sivonen wrote: >> What is then the point of SetCapacity anymore? > > To avoid multiple allocations during a sequence of Append()s. (This is > documented on the header.) At this point, it's probably relevant to mention that SetCapacity() in situations other that ahead of a sequence of Append()s is most likely wrong (and has been so since at least 2004; I didn't bother doing code archeology further back than that). SetCapacity() followed immediately by Truncate() is bad. SetCapacity() allocates a buffer. Truncate() releases the buffer. SetCapacity() followed immediately by AssignLiteral() of the compatible character type ("" literal with nsACString and u"" literal with nsAString) is bad. SetCapacity() allocates a buffer. AssignLiteral() releases the buffer and makes the string point to the literal in POD. SetCapacity() followed immediately by Adopt() is bad. SetCapacity() allocates a buffer. Adopt() releases the buffer and makes the string point to the buffer passed to Adopt(). SetCapacity() followed immediately by Assign() is likely bad. If the string that gets assigned points to a shareable buffer and doesn't need to be copied, Assign() releases the buffer allocated by SetCapacity(). Allocating an nsAuto[C]String and immediately calling SetCapacity() with a constant argument is bad. If the requested capacity is smaller than the inline buffer, it's a no-op. If the requested capacity is larger, the inline buffer is wasted stack space. Instead of SetCapacity(N), it makes sense to declare nsAuto[C]StringN (with awareness that a very large N may be a problem in terms of overflowing the run-time stack). (I've seen all of the above in our code base and have a patch coming up.) -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Extending the length of an XPCOM string when writing to it via a raw pointer
On Thu, Aug 30, 2018 at 6:00 PM, smaug wrote: > On 08/30/2018 11:21 AM, Henri Sivonen wrote: >> >> We have the following that a pattern in our code base: >> >> 1) SetCapacity(newCapacity) is called on an XPCOM string. >> 2) A pointer obtained from BeginWriting() is used for writing more >> than Length() but no more than newCapacity code units to the XPCOM >> string. >> 3) SetLength(actuallyWritten) is called on the XPCOM string such >> that actuallyWritten > Length() but actuallyWritten <= newCapacity. >> >> This is an encapsulation violation, because the string implementation >> doesn't know that the content past Length() is there. > > How so? The whole point of capacity is that the string has that much > capacity. It has that much capacity for the use of the string implementation so that you can do a sequence of Append()s without reallocating multiple times. It doesn't mean that the caller is eligible to write into the internal structure of the string in an undocumented way. > The caller >> >> assumes that step #3 will not reallocate and will only write a >> zero-terminator at actuallyWritten and set mLength to actuallyWritten. >> (The pattern is common enough that clearly people have believed it to >> be a valid pattern. However, I haven't seen any in-tree or on-wiki >> string documentation endorsing the pattern.) >> >> It should be non-controversial that this is an encapsulation >> violation, > > Well, I'm not seeing any encapsulation violation ;) > > >> but does the encapsulation violation matter? It matters if >> we want SetLength() to be able to conserve memory by allocating a >> smaller buffer when actuallyWritten code units would fit in a smaller >> mozjemalloc bucket. > > > Please be very very careful when doing allocations and deallocations. They > are very slow, showing > up all the time in performance profiles. There is a threshold so that we don't reallocate from small to even smaller. There's a good chance that the threshold should be higher than it is now. > In order for the above pattern to work if >> >> SetLength() can reallocate in such case, SetLength() has to memcpy the >> whole buffer in case someone has written content that the string >> implementation is unaware of instead of just memcpying the part of the >> buffer that the string implementation knows to be in use. Pessimizing >> the number of code units to memcpy is bad for performance. >> >> It's unclear if trying to use a smaller mozjemalloc bucket it is a >> worthwhile thing. It obviously is for large long-lived strings and it >> obviously isn't for short-lived strings even if they are large. >> SetLength() doesn't know what the future holds for the string. :-( But >> in any case, it's bad if we can't make changes that are sound from the >> perspective of the string encapsulation, because we have other code >> violating the encapsulation. >> >> After the soft freeze, I'd like to change things so that we memcpy >> only the part of the buffer that the string implementation knows is in >> use. To that end, we should stop using the above pattern that is an >> encapsulation violation. >> > What is then the point of SetCapacity anymore? To avoid multiple allocations during a sequence of Append()s. (This is documented on the header.) >> For m-c, I've filed >> https://bugzilla.mozilla.org/show_bug.cgi?id=1472113 and worked on >> fixing the bugs that block it. For c-c, I've filed >> https://bugzilla.mozilla.org/show_bug.cgi?id=1486706 but don't intend >> to do the work of investigating or fixing the string usage in c-c. >> >> As for fixing the above pattern, there are two alternatives. The first one >> is: >> >> 1) SetLength(newCapacity) >> 2) BeginWriting() >> 3) Truncate(actuallyWritten) (or SetLength(actuallyWritten), Truncate >> simply tells to the reader that the string isn't being made longer) >> >> With this pattern, writing happens to the part of the buffer that the >> string implementation believes to be in use. This has the downside >> that the first SetLength() call (like, counter-intuitively, >> SetCapacity() currently!) writes the zero terminator, which from the >> point of view of CPU caches is an out-of-the-blue write that's not >> part of a well-behaved forward-only linear write pattern and not >> necessarily near recently-accessed locations. >> >> The second alternative is BulkWrite() in C++ and bulk_write() in Rust. > > The API doesn't seem to b
Extending the length of an XPCOM string when writing to it via a raw pointer
We have the following that a pattern in our code base: 1) SetCapacity(newCapacity) is called on an XPCOM string. 2) A pointer obtained from BeginWriting() is used for writing more than Length() but no more than newCapacity code units to the XPCOM string. 3) SetLength(actuallyWritten) is called on the XPCOM string such that actuallyWritten > Length() but actuallyWritten <= newCapacity. This is an encapsulation violation, because the string implementation doesn't know that the content past Length() is there. The caller assumes that step #3 will not reallocate and will only write a zero-terminator at actuallyWritten and set mLength to actuallyWritten. (The pattern is common enough that clearly people have believed it to be a valid pattern. However, I haven't seen any in-tree or on-wiki string documentation endorsing the pattern.) It should be non-controversial that this is an encapsulation violation, but does the encapsulation violation matter? It matters if we want SetLength() to be able to conserve memory by allocating a smaller buffer when actuallyWritten code units would fit in a smaller mozjemalloc bucket. In order for the above pattern to work if SetLength() can reallocate in such case, SetLength() has to memcpy the whole buffer in case someone has written content that the string implementation is unaware of instead of just memcpying the part of the buffer that the string implementation knows to be in use. Pessimizing the number of code units to memcpy is bad for performance. It's unclear if trying to use a smaller mozjemalloc bucket it is a worthwhile thing. It obviously is for large long-lived strings and it obviously isn't for short-lived strings even if they are large. SetLength() doesn't know what the future holds for the string. :-( But in any case, it's bad if we can't make changes that are sound from the perspective of the string encapsulation, because we have other code violating the encapsulation. After the soft freeze, I'd like to change things so that we memcpy only the part of the buffer that the string implementation knows is in use. To that end, we should stop using the above pattern that is an encapsulation violation. For m-c, I've filed https://bugzilla.mozilla.org/show_bug.cgi?id=1472113 and worked on fixing the bugs that block it. For c-c, I've filed https://bugzilla.mozilla.org/show_bug.cgi?id=1486706 but don't intend to do the work of investigating or fixing the string usage in c-c. As for fixing the above pattern, there are two alternatives. The first one is: 1) SetLength(newCapacity) 2) BeginWriting() 3) Truncate(actuallyWritten) (or SetLength(actuallyWritten), Truncate simply tells to the reader that the string isn't being made longer) With this pattern, writing happens to the part of the buffer that the string implementation believes to be in use. This has the downside that the first SetLength() call (like, counter-intuitively, SetCapacity() currently!) writes the zero terminator, which from the point of view of CPU caches is an out-of-the-blue write that's not part of a well-behaved forward-only linear write pattern and not necessarily near recently-accessed locations. The second alternative is BulkWrite() in C++ and bulk_write() in Rust. This is new API that is well-behaved in terms of the cache access pattern and is also more versatile in the sense that it lets the caller know how newCapacity was rounded up, which is relevant to callers that ask for best-case capacity and then ask more capacity if there turns out to be more to write. When the caller is made aware of the rounding, a second request for added capacity can be avoided if the amount that actually needs to be written exceeds the best case estimate but fits within the rounded-up capacity. In Rust, bulk_write() is rather nicely misuse-resistant. However, on the C++ side the lack of a borrow checker as well as mozilla::Result not working with move-only types (https://bugzilla.mozilla.org/show_bug.cgi?id=1418624) pushes more things to documentation. The documentation can be found at https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Guide/Internal_strings#Bulk_Write https://searchfox.org/mozilla-central/rev/2fe43133dbc774dda668d4597174d73e3969181a/xpcom/string/nsTSubstring.h#1190 https://searchfox.org/mozilla-central/rev/2fe43133dbc774dda668d4597174d73e3969181a/xpcom/string/nsTSubstring.h#32 P.S. GetMutableData() is redundant with BeginWriting() and SetLength(). It's used very rarely and I'd like to remove it as redundant, so please don't use GetMutableData(). -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Dead-code removal of unused Rust FFI exports
On Tue, Aug 28, 2018 at 4:56 PM, Till Schneidereit wrote: > On Tue, Aug 28, 2018 at 3:20 PM Mike Hommey wrote: >> We don't LTO across languages on any platform yet. Rust is LTOed on all >> platforms, which removes a bunch of its symbols. Everything that is >> exposed for C/C++ from Rust, though, is left alone. That's likely to >> stay true even with cross-language LTO, because as far as the linker is >> concerned, those FFI symbols might be used by code that link against >> libxul, so it would still export them. We'd essentially need the >> equivalent to -fvisibility=hidden for Rust for that to stop being true. Exporting Rust FFI symbols from libxul seems bad not just for binary size but also in terms of giving contact surface to invasive third-party Windows software. Do we have a bug on file tracking the hiding of FFI symbols from the outside of libxul? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Dead-code removal of unused Rust FFI exports
Does some lld mechanism successfully remove dead code when gkrust exports some FFI function that the rest of Gecko never ends up calling? I.e. in terms of code size, is it OK to vendor an FFI-exposing Rust crate where not every FFI function is used (at least right away)? -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Please don't use functions from ctype.h and strings.h
I think it's worthwhile to have a lint, but regexps are likely to have false positives, so using clang-tidy is probably better. A bug is on file: https://bugzilla.mozilla.org/show_bug.cgi?id=1485588 On Mon, Aug 27, 2018 at 4:06 PM, Tom Ritter wrote: > Is this something worth making a lint over? It's pretty easy to make > regex-based lints, e.g. > > yml-only based lint: > https://searchfox.org/mozilla-central/source/tools/lint/cpp-virtual-final.yml > > yml+python for slightly more complicated regexing: > https://searchfox.org/mozilla-central/source/tools/lint/mingw-capitalization.yml > https://searchfox.org/mozilla-central/source/tools/lint/cpp/mingw-capitalization.py > > -tom > > On Mon, Aug 27, 2018 at 7:04 AM, Henri Sivonen wrote: >> >> Please don't use the functions from ctype.h and strings.h. >> >> See: >> https://daniel.haxx.se/blog/2018/01/30/isalnum-is-not-my-friend/ >> https://daniel.haxx.se/blog/2008/10/15/strcasecmp-in-turkish/ >> >> https://stackoverflow.com/questions/2898228/can-isdigit-legitimately-be-locale-dependent-in-c >> >> In addition to these being locale-sensitive, the functions from >> ctype.h are defined to take (signed) int with the value space of >> *unsigned* char or EOF and other argument values are Undefined >> Behavior. Therefore, on platforms where char is signed, passing a char >> sign-extends to int and invokes UB if the most-significant bit of the >> char was set! Bug filed 15 years ago! >> https://bugzilla.mozilla.org/show_bug.cgi?id=216952 (I'm not aware of >> implementations doing anything surprising with this UB but there >> exists precedent for *compiler* writers looking at the standard >> *library* UB language and taking calls into standard library functions >> as optimization-guiding assertions about the values of their >> arguments, so better not risk it.) >> >> For isfoo(), please use mozilla::IsAsciiFoo() from mozilla/TextUtils.h. >> >> For tolower() and toupper(), please use ToLowerCaseASCII() and >> ToUpperCaseASCII() from nsUnicharUtils.h >> >> For strcasecmp() and strncasecmp(), please use their nsCRT::-prefixed >> versions from nsCRT.h. >> >> (Ideally, we should scrub these from vendored C code, too, since being >> in third-party code doesn't really make the above problems go away.) >> >> -- >> Henri Sivonen >> hsivo...@mozilla.com >> ___ >> dev-platform mailing list >> dev-platform@lists.mozilla.org >> https://lists.mozilla.org/listinfo/dev-platform > > -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Please don't use functions from ctype.h and strings.h
Please don't use the functions from ctype.h and strings.h. See: https://daniel.haxx.se/blog/2018/01/30/isalnum-is-not-my-friend/ https://daniel.haxx.se/blog/2008/10/15/strcasecmp-in-turkish/ https://stackoverflow.com/questions/2898228/can-isdigit-legitimately-be-locale-dependent-in-c In addition to these being locale-sensitive, the functions from ctype.h are defined to take (signed) int with the value space of *unsigned* char or EOF and other argument values are Undefined Behavior. Therefore, on platforms where char is signed, passing a char sign-extends to int and invokes UB if the most-significant bit of the char was set! Bug filed 15 years ago! https://bugzilla.mozilla.org/show_bug.cgi?id=216952 (I'm not aware of implementations doing anything surprising with this UB but there exists precedent for *compiler* writers looking at the standard *library* UB language and taking calls into standard library functions as optimization-guiding assertions about the values of their arguments, so better not risk it.) For isfoo(), please use mozilla::IsAsciiFoo() from mozilla/TextUtils.h. For tolower() and toupper(), please use ToLowerCaseASCII() and ToUpperCaseASCII() from nsUnicharUtils.h For strcasecmp() and strncasecmp(), please use their nsCRT::-prefixed versions from nsCRT.h. (Ideally, we should scrub these from vendored C code, too, since being in third-party code doesn't really make the above problems go away.) -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Changes in XPCOM string encoding conversions
I've made changes to encoding conversions between XPCOM strings. Here are the important bits of info: * The conversions are now generally faster, so processing text as UTF-8 should be considered more appropriate than before even if there exists a case where the data needs to be passed to a UTF-16 consumer. * There are now faster paths in Rust for appending or assigning &str to nsAString. If you have an &str, please use the *_str() methods instead of the *_utf8() methods on nsAString in Rust. (That is, the *_str() methods make use of the knowledge that the input is guaranteed to be valid UTF-8.) * I'd like to make the UTF-16 to Latin1 (the function names say ASCII instead of Latin1 for legacy reasons) conversion assert in debug builds if the input isn't in the Latin1 range, so if you have a mostly-ASCII UTF-16 string that you want to printf for logging and don't care about what happens to non-ASCII, please convert to UTF-8 as your "I don't care about non-ASCII" conversion. * The conversions between UTF-16 and UTF-8 in both directions now implement spec-compliant REPLACEMENT CHARACTER generation if the input UTF isn't valid. Previously, the output got truncated. This is not to say that it's now OK to be less diligent about UTF validity but to say that you can't rely on the old truncation behavior. * There are now conversions between UTF-8 and Latin1 to allow for more efficient interaction with UTF-8 and SpiderMonkey strings and DOM text nodes going forward. * The conversions no longer accept zero-terminated C-style strings. The cost of strlen() is now made visible to the caller by requiring the caller to wrap C-style strings with mozilla::MakeStringSpan(). (Please avoid C-style strings. Strings that know their length are nice. This change wasn't made in order to make the use of C-style string hard, though, but in order to avoid clang complaining about ambiguous overloads.) -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Rust crate approval
On Wed, Jun 27, 2018 at 5:02 AM, Adam Gashlin wrote: > * Already vendored crates > Can I assume any crates we have already in mozilla-central are ok to use? Seems like a reasonable assumption. > * Updates > I need winapi 0.3.5 for BITS support, currently third_party/rust/winapi is > 0.3.4. There should be no problem updating it, but should I have this > reviewed by the folks who originally vendored it into mozilla-central? In my opinion, it should be enough for _someone_ qualified to review code in the area of Windows integration to review the diff. > * New crates > I'd like to use the windows-service crate, which seems well written and has > few dependencies, but the first 0.1.0 release was just a few weeks ago. I'd > like to have that reviewed at least as carefully as my own code, > particularly given how much unsafety there is, but where do I draw the > line? For instance, it depends on "widestring", which is small and has been > around for a while but isn't widely used, should I have that reviewed > internally as well? Is popularity a reasonable measure? In principle, all code landing in m-c needs to be reviewed, but sometimes the reviewer may rubber-stamp code instead of truly reviewing it carefully. All the newly-vendored code should be part of the review request and then it's up to the reviewer to decide if it's appropriate to look at some code less carefully because there are other indicators of quality. As for widestring specifically, a cursory look at the code suggests that it's a quality crate and should have no trouble passing review. It is also small enough that it should be actually feasible to review it instead of rubber-stamping it. (For Mozilla-developed code that is on a performance-sensitive path, there exists encoding_rs::mem (particularly https://docs.rs/encoding_rs/0.8.4/encoding_rs/mem/fn.convert_str_to_utf16.html and https://docs.rs/encoding_rs/0.8.4/encoding_rs/mem/fn.convert_utf16_to_str.html), which doesn't provide the ergonomics that widestring provides but provides faster (SIMD-accelerated on our tier-1 CPU architectures and aarch64, which is on path to tier-1) conversions for long (16 code units or longer) strings containing mostly ASCII code points. An update service probably isn't performance-sensitive in this way. I'm mentioning this to generate awareness generally on the topic of UTF-16 conversions in m-c Rust code.) -- Henri Sivonen hsivo...@mozilla.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Update on rustc/clang goodness
On Wed, May 30, 2018 at 5:16 PM, Mike Hommey wrote: > On Wed, May 30, 2018 at 02:40:01PM +0300, Henri Sivonen wrote: >> The Linux distro case is >> trickier than Mozilla's compiler choice. For CPUs that are tier-3 for >> Mozilla, we already tolerate less great performance attributes in >> order to enable availability, so distros keeping using GCC for tier-3 >> probably isn't a problem. x86_64 could be a problem, though. If >> Firefox's performance becomes significantly dependent on having >> cross-language inlining, and I expect it will, having a substantial >> portion of the user base run without it while thinking they have a >> top-tier build could be bad. I hope we can get x86_64 Linux distros to >> track our compiler configuration closely. > > That part might end up more difficult than one could expect. > Cross-language inlining is going to require rustc and clang having a > compatible llvm ir, and that's pretty much guaranteed to be a problem, > even for Mozilla. I thought the rustc codebase supported building with unpatched LLVM in order to let distros maintain one copy of LLVM source (if not .so). Is that not the case? Why couldn't Mozilla build clang with Rust's LLVM fork and use that for building releases? (And move Rust's fork forward as needed.) (Also, what Ehsan said about IR compat suggests these might not even need to be as closely synced.) -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Update on rustc/clang goodness
On Wed, May 30, 2018 at 8:06 AM, Dave Townsend wrote: > On Tue, May 29, 2018 at 10:03 PM Jeff Gilbert wrote: >> I get that, but it reminds me of the reasons people give for "our >> website works best in $browser". > > I was concerned by this too but found myself swayed by the arguments in > https://blog.mozilla.org/nfroyd/2018/05/29/when-implementation-monoculture-right-thing/and > in particular the first comment there. Indeed, the first comment there (by roc) gets to the point. Additionally, the reasons for not supporting multiple browsers tend to be closer to the "didn't bother" kind whereas we're looking to get a substantial benefit from clang that MSVC and GCC don't offer to us but clang likely will: Cross-language inlining across code compiled with clang and code compiled with rustc. To the extent Mozilla runs the compiler, it makes sense to go for the Open Source choice that allows us to deliver better on "performance as a feature". We still have at least one static analysis running on GCC, so I wouldn't expect GCC-compatibility to be dropped even if the app wouldn't be "best compiled with" GCC. The Linux distro case is trickier than Mozilla's compiler choice. For CPUs that are tier-3 for Mozilla, we already tolerate less great performance attributes in order to enable availability, so distros keeping using GCC for tier-3 probably isn't a problem. x86_64 could be a problem, though. If Firefox's performance becomes significantly dependent on having cross-language inlining, and I expect it will, having a substantial portion of the user base run without it while thinking they have a top-tier build could be bad. I hope we can get x86_64 Linux distros to track our compiler configuration closely. I do feel bad for the GCC devs, but it's worth noting that this is a result of a deliberate decision not to modularize GCC for licensing strategy reasons while LLVM has been designed as a module, demand for which has had solid technical reasons. The modularity meant it made more sense to build rustc on LLVM than on GCC and now that technical design leads to better synergy with clang. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: License of test data?
On Thu, May 17, 2018 at 8:31 PM, mhoye wrote: > Well, more than a day or two. The MIT license is fine to include, and we > have a pile of MIT-licensed code in-tree already. > > Other already-in-tree MPL-2.0 compatible licenses - the "just do it" set, > basically - include Apache 2.0, BSD 2- and 3-clause, LGPL 2.1 and 3.0, GPL > 3.0 and the Unicode Consortium's ICU. Does "just do it" imply that it's now OK to import that stuff without an analog of the previous r+ from Gerv? > For anything not on that list a legal bug is def. the next step. For test files, i.e. stuff that doesn't get linked into libxul, we also have precedent for the MPL-incompatible CC-by and CC-by-sa. I hope we can add these to the above list. On Fri, May 18, 2018 at 12:33 AM, Mike Hommey wrote: > The above list is for tests. For things that go in Firefox, it's more > complicated. LGPL have requirements that makes us have to put all LGPL > libraries in a separate dynamic library (liblgpllibs), and GPL can't be > used at all. For stuff that goes into Firefox, MIT and BSD (and, I'm guessing, Apache with NOTICE file) involve editing https://searchfox.org/mozilla-central/source/toolkit/content/license.html , too. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Proposed W3C Charter: JSON-LD Working Group
On Sun, Apr 29, 2018, 19:35 L. David Baron wrote: > OK, here's a draft of an explicit abtension that I can submit later > today. Does this seem reasonable? > This looks good to me. Thank you. > > One concern that we've had over the past few years about JSON-LD > is that some people have been advocating that formats adopt > JSON-LD semantics, but at the same time allow processing as > regular JSON, as a way to make the adoption of JSON-LD > lighter-weight for producers and consumers who (like us) don't > want to have to implement full JSON-LD semantics. This yields a > format with two classes of processors that will produce different > results on many inputs, which is bad for interoperability. And > full JSON-LD implementation is often more complexity than needed > for both producers and consumers of content. We don't want > people who produce Web sites or maintain Web browsers to have to > deal with this complexity. For more details on this issue, see > https://hsivonen.fi/no-json-ns/ . > > This leads us to be concerned about the Coordination section in > the charter, which suggests that some W3C Groups that we are > involved in or may end up implementing the work of (Web of > Things, Publishing) will use JSON-LD. We would prefer that the > work of these groups not use JSON-LD for the reasons described > above, but this charter seems to imply that they will. > > While in general we support doing maintenance (and thus aren't > objecting), we're also concerned that the charter is quite > open-ended about what new features will be included (e.g., > referring to "requests for new features" and "take into account > new features and desired simplifications that have become > apparent since its publication"). As the guidance in > https://www.w3.org/Guide/standards-track/ suggests, new features > should be limited to those already incubated in the CG. (If we > were planning to implement, we might be objecting on these > grounds.) > > > -David > > -- > 𝄞 L. David Baron http://dbaron.org/ 𝄂 > 𝄢 Mozilla https://www.mozilla.org/ 𝄂 > Before I built a wall I'd ask to know > What I was walling in or walling out, > And to whom I was like to give offense. >- Robert Frost, Mending Wall (1914) > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Proposed W3C Charter: JSON-LD Working Group
On Fri, Apr 27, 2018 at 1:04 AM, L. David Baron wrote: > The W3C is proposing a charter for: > > JSON-LD Working Group > https://www.w3.org/2018/03/jsonld-wg-charter.html > https://lists.w3.org/Archives/Public/public-new-work/2018Mar/0004.html > > Mozilla has the opportunity to send comments or objections through > Sunday, April 29. (Sorry for failing to send this out sooner!) > > This is a charter to produce JSON-LD 1.1 (A JSON-based Serialization > for Linked Data), which is a revision of JSON-LD 1.0, which was > developed under the now-closed RDF Working Group. The > specifications proposed in a charter have been developed in a > community group. > > Please reply to this thread if you think there's something we should > say as part of this charter review, or if you think we should > support or oppose it. JSON-LD's compatibility with the RDF data model and the fundamental principle that identifiers expand to URIs means that JSON-LD perpetuates the fundamental ergonomic problem of RDF. All serializations of RDF, JSON-LD included, try to take steps to alleviate the visibility of the problem on the syntax level but none can git rid of the problem on the data model layer (since they subscribe to the fundamental principle that is the source of the problem and don't break data model compatibility). Thus, code that processes the formats either has to be unergonomic or incorrect. It's bad to have specs that promote unergonomic or incorrect implementations and especially the mutually-incompatible co-existence of the two. See https://hsivonen.fi/no-json-ns/ for an elaboration--especially the epilog. JSON-LD evangelism, including the slides linked to from the charter (slide 3), tends to be about selling the format by claiming that developers can ignore the RDF/URI stuff (i.e. write code that's incorrect in terms of the full processing model). The very last section of https://hsivonen.fi/no-json-ns/ addresses this based on experience from Web-scale formats. For this reason, I think we should resist introducing dependencies on JSON-LD in formats and APIs that are relevant to the Web Platform. I think it follows that we should not support this charter. I expect this charter to pass in any case, so I'm not sure us saying something changes anything, but it might still be worth a try to register displeasure about the prospect of JSON-LD coming into contact with stuff that Web engines or people who make Web apps or sites need to deal with and to register displeasure with designing formats whose full processing model differs from how the format is evangelized to developers (previously: serving XHTML as text/html while pretending to get benefits of the XML processing model that way). -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Default Rust optimization level decreased from 2 to 1
On Wed, Apr 25, 2018 at 7:11 PM, Gregory Szorc wrote: > The build peers have long thought about adding the concept of “build > profiles” to the build system. Think of them as in-tree mozconfigs for > common, supported scenarios. This would be good to have. It would also help if such mozconfigs had comprehensive comments explaining how our release builds differ from tooling defaults especially now that we have areas of code that are developed outside a full Firefox build. For example, I A/B tested code for performance using cargo bench and mistakenly thought that cargo's "release" mode meant the same thing as Firefox "release" mode. I only later learned that I had developed with opt-level=3 (cargo's default for "release") and we ship with opt-level=2. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent To Require Manifests For Vendored Code In mozilla-central
On Tue, Apr 10, 2018 at 7:33 AM, Byron Jones wrote: > glob wrote: >> >> The plan is to create a YAML file for each library containing metadata >> such as the homepage url, vendored version, bugzilla component, etc. See >> https://goo.gl/QZyz4xfor the full specification. > > > this should be: https://goo.gl/QZyz4x for the full specification. This proposal makes sense to me when it comes to libraries that are not vendored from crates.io. However, this seems very heavyweight and only adds the Bugzilla metadata for crates.io crates. It seems to me that declaring the Bugzilla component isn't worth the trouble of having another metadata file in addition to Cargo.toml. Additonally, the examples suggest that this invents new ad hoc license identifiers. I suggest we not do that but instead use https://spdx.org/licenses/ and have a script to enforce that bogus values don't creep in. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Is realloc() between bucket sizes worthwhile with jemalloc?
On Mon, Apr 9, 2018 at 10:30 PM, Eric Rahm wrote: >> Upon superficial code reading, it seems to me that currently changing >> the capacity of an nsA[C]STring might uselessly use realloc to copy >> data that's not semantically live data from the string's point of view >> and wouldn't really need to be preserved. Have I actually discovered >> useless copying or am I misunderstanding? > > > In this case I think you're right. In the string code we use a doubling > strategy up to 8MiB so they'll always be in a new bucket/chunk. After 8MiB > we grow by 1.125 [2], but always round up to the nearest MiB. Our > HugeRealloc logic always makes a new allocation if the difference is greater > than or equal to 1MiB [3] so that's always going to get hit. I should note > that on OSX we use some sort of 'pages_copy' when the realloc is large > enough, this is probably more efficient than memcpy. Thanks. Being able to avoid useless copying for most strings probably outweighs the loss of the pages_copy optimization for huge strings on Mac. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Editing a vendored crate for a try push
On Mon, Apr 9, 2018 at 10:32 PM, wrote: > On Monday, April 9, 2018 at 11:39:35 AM UTC-4, Henri Sivonen wrote: >> How do I waive .cargo-checksum.json checking for a crate? > > In bug 1449613 (part 12) I just hand-edited the .cargo-checksum.json in > question, and updated the sha256 values for the modified files. That was > enough to get try runs going (though the debug builds do show related > failures, they still built and ran my tests). This is what I've done, but it shouldn't have to be like this. On Mon, Apr 9, 2018 at 8:48 PM, Andreas Tolfsen wrote: > I don’t know the exact answer to your question, but it might be > possible to temporarily depend on the crate by a its path? > > There are a couple of examples of this in > https://searchfox.org/mozilla-central/source/testing/geckodriver/Cargo.toml. > > Remember to run "cargo update -p ". This seems more complicated than editing the crate in place and manually editing the sha256 values. What I'm looking for is a simple way to skip the sha256 editing. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Is realloc() between bucket sizes worthwhile with jemalloc?
My understanding is that under some "huge" size, jemalloc returns allocations from particularly-sized buckets. This makes me expect that realloc() between bucket sizes is going to always copy the data instead of just adjusting allocated metadata, because to do otherwise would mess up the bucketing. Is this so? Specifically, is it actually useful that nsStringBuffer uses realloc() as opposed to malloc(), memcpy() with actually semantically filled amount and free()? Upon superficial code reading, it seems to me that currently changing the capacity of an nsA[C]STring might uselessly use realloc to copy data that's not semantically live data from the string's point of view and wouldn't really need to be preserved. Have I actually discovered useless copying or am I misunderstanding? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Editing a vendored crate for a try push
What's the current status of tooling for editing vendored crates for local testing and try pushes? It looks like our toml setup is too complex for cargo edit-locally to handle (or, alternatively, I'm holding it wrong). It also seems that mach edit-crate never happened. How do I waive .cargo-checksum.json checking for a crate? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform