Re: Intent to implement and ship: UTF-8 autodetection for HTML and plain text loaded from file: URLs

2018-12-11 Thread Henri Sivonen
On Tue, Dec 11, 2018 at 2:24 AM Martin Thomson  wrote:
> This seems reasonable, but 50M is a pretty large number.  Given the
> odds of UTF-8 detection failing, I would have thought that this could
> be much lower.

Consider the case of a document of ASCII text with a copyright sign in
the footer. I'd rather not make anyone puzzle over why the behavior of
the footer depends on how much text comes before the footer.

50 MB is intentionally extremely large relative to "normal" HTML and
text files so that the limit is reached approximately "never" unless
you open *huge* log files.

The HTML spec is about 11 MB these days, so that's existence proof
that a non-log-file HTML document can exceed 10 MB. Of course, the
limit doesn't need to be larger than present-day UTF-8 files but
larger than "normal"-sized *legacy* non-UTF-8 files.

It is quite possible that 50 MB is *too* large considering 32-bit
systems and what *other* allocations are proportional to the buffer
size, and I'm open to changing the limit to something smaller than 50
MB as long as it's still larger than "normal" non-UTF-8 HTML and text
files.

How about I change it to 5 MB on the assumption that that's still very
large relative to pre-UTF-8-era HTML and text file sizes?

> What is the number in Chrome?

It depends. It's unclear to me what exactly it depends on. Based on
https://github.com/whatwg/encoding/issues/68#issuecomment-272993181 ,
I expect it to depend on some combination of file system, OS kernel
and Chromium IO library internals.

On Ubuntu 18.04 with ext4 on an SSD, the number is 64 KB. On Windows
10 1803 with NTFS on an SSD, it's something smaller.

I think making the limit depend on the internals of file IO buffering
instead of a constant in the HTML parser is a really bad idea. Also 64
KB or something less than 64 KB seem way too small for the purpose of
making it so that the user approximately never needs to puzzle over
why things are different based on the length of the ASCII prefix of a
file with non-ASCII later in the file.

> I assume that other local sources like chrome: are expected to be
> annotated properly.

>From source inspection, it seems that chrome: URLs already get
hard-coded to UTF-8 on the channel level:
https://searchfox.org/mozilla-central/source/chrome/nsChromeProtocolHandler.cpp#187

As part of developing the patch, I saw only resource: URLs showing up
as file: URLs to the HTML parser, so only resource: URLs got a special
check that fast-tracks them to UTF-8 instead of buffering for
detection like normal file: URLs.

> On Mon, Dec 10, 2018 at 11:28 PM Henri Sivonen  wrote:
> >
> > (Note: This isn't really a Web-exposed feature, but this is a Web
> > developer-exposed feature.)
> >
> > # Summary
> >
> > Autodetect UTF-8 when loading HTML or plain text from file: URLs (only!).
> >
> > Some Web developers like to develop locally from file: URLs (as
> > opposed to local HTTP server) and then deploy using a Web server that
> > declares charset=UTF-8. To get the same convenience as when developing
> > with Chrome, they want the files loaded from file: URLs be treated as
> > UTF-8 even though the HTTP header isn't there.
> >
> > Non-developer users save files from the Web verbatim without the HTTP
> > headers and open the files from file: URLs. These days, those files
> > are most often in UTF-8 and lack the BOM, and sometimes they lack
> > , and plain text files can't even use  > charset=utf-8>. These users, too, would like a Chrome-like convenience
> > when opening these files from file: URLs in Firefox.
> >
> > # Details
> >
> > If a HTML or plain text file loaded from a file: URL does not contain
> > a UTF-8 error in the first 50 MB, assume it is UTF-8. (It is extremely
> > improbable for text intended to be in a non-UTF-8 encoding to look
> > like valid UTF-8 on the byte level.) Otherwise, behave like at
> > present: assume the fallback legacy encoding, whose default depends on
> > the Firefox UI locale.
> >
> > The 50 MB limit exists to avoid buffering everything when loading a
> > log file whose size is on the order of a gigabyte. 50 MB is an
> > arbitrary size that is significantly larger than "normal" HTML or text
> > files, so that "normal"-sized files are examined with 100% confidence
> > (i.e. the whole file is examined) but can be assumed to fit in RAM
> > even on computers that only have a couple of gigabytes of RAM.
> >
> > The limit, despite being arbitrary, is checked exactly to avoid
> > visible behavior changes depending on how Necko chooses buffer
> > boundaries.
> >
> > The limit is a number of bytes instead of a timeout in order to avoid
> > reintroducing timing dependencies (carefully removed in Firefox 4) to
> > HTML parsing--even for file: URLs.
> >
> > Unless a  declaring the encoding (or a BOM) is found within the
> > first 1024 bytes, up to 50 MB of input is buffered before starting
> > tokenizing. That is, the feature assumes that local files don't need
> > incremental HTML parsing, that lo

Re: Dropping support for compiling with MSVC in the near future

2018-12-11 Thread Ted Mielczarek
On Mon, Dec 10, 2018, at 8:29 PM, Kartikaya Gupta wrote:
> This is sort of tangential, but what's the linking story currently?
> Are we still linking with MSVC, or with lld?


We're using lld-link for Windows builds in CI:
https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/browser/config/mozconfigs/win64/common-opt#23
https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/build/win64/mozconfig.vs-latest#3
https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/build/mozconfig.lld-link#8

> I discovered recently
> that the rust toolchain we use in automation on windows tries to use
> link.exe for linking.

We use binaries repackaged from Rust releases, so this is just the way rustc 
works on Windows. We do pass the linker we use for non-Windows platforms down 
to cargo in the Firefox build:
https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/config/rules.mk#1009

and we pass down a host linker on Win32 builds because we use a 64-bit rustc:
https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/build/moz.configure/rust.configure#343
https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/.cargo/config.in#23

> I'm in the process of trying to get standalone
> webrender tests running on taskcluster/windows, and to do that I had
> to ensure that the worker has MSVC installed (I couldn't get it to
> play nice with lld-link.exe from the clang-cl toolchain, but maybe I
> didn't do it right). If we remove support for MSVC generally, are we
> going to remove the MSVC tarballs from tooltool as well?

Due to some other reasons we will still require the MSVC tarballs for building 
in CI. We need the standard library headers + libs as well as the Windows SDK 
even when using clang-cl.

-Ted
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Dropping support for compiling with MSVC in the near future

2018-12-11 Thread Kartikaya Gupta
This makes sense, thanks!
On Tue, Dec 11, 2018 at 6:45 AM Ted Mielczarek  wrote:
>
> On Mon, Dec 10, 2018, at 8:29 PM, Kartikaya Gupta wrote:
> > This is sort of tangential, but what's the linking story currently?
> > Are we still linking with MSVC, or with lld?
>
>
> We're using lld-link for Windows builds in CI:
> https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/browser/config/mozconfigs/win64/common-opt#23
> https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/build/win64/mozconfig.vs-latest#3
> https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/build/mozconfig.lld-link#8
>
> > I discovered recently
> > that the rust toolchain we use in automation on windows tries to use
> > link.exe for linking.
>
> We use binaries repackaged from Rust releases, so this is just the way rustc 
> works on Windows. We do pass the linker we use for non-Windows platforms down 
> to cargo in the Firefox build:
> https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/config/rules.mk#1009
>
> and we pass down a host linker on Win32 builds because we use a 64-bit rustc:
> https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/build/moz.configure/rust.configure#343
> https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/.cargo/config.in#23
>
> > I'm in the process of trying to get standalone
> > webrender tests running on taskcluster/windows, and to do that I had
> > to ensure that the worker has MSVC installed (I couldn't get it to
> > play nice with lld-link.exe from the clang-cl toolchain, but maybe I
> > didn't do it right). If we remove support for MSVC generally, are we
> > going to remove the MSVC tarballs from tooltool as well?
>
> Due to some other reasons we will still require the MSVC tarballs for 
> building in CI. We need the standard library headers + libs as well as the 
> Windows SDK even when using clang-cl.
>
> -Ted
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Intent to implement and ship: forced case-sensitive attribute selector matching

2018-12-11 Thread Boris Zbarsky
Summary: When matching CSS attribute selectors against HTML, some 
attribute names lead to case-sensitive value matching, while others use 
ASCII-case-insensitive matching.  The proposed feature adds an 's' flag 
on attribute selectors that forces case-sensitive matching, just like 
the existing 'i' flag forces ASCII-case-insensitive matching.


Bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1512386

Standard: https://drafts.csswg.org/selectors-4/#attribute-case

Platform coverage: all

Estimated or target release: Firefox 66.

Preference behind which this will be implemented: None.

Is this feature enabled by default in sandboxed iframes? Yes.

DevTools bug: None needed, as far as I can tell.

Do other browser engines implement this?  Not yet.  This is a new 
addition to the standard.


web-platform-tests: This is tested in the tests in 
http://w3c-test.org/css/selectors/attribute-selectors/attribute-case/


Is this feature restricted to secure contexts?  No, like other CSS 
syntax features.


Spec stability: Not 100% clear, but I expect it's pretty stable; on the 
spec level this is a tiny change and there's not much controversy about 
which letter to use for the flag, I would think.


Security & Privacy Concerns: None

Web designer / developer use-cases AKA Why a developer would use Feature 
X? https://github.com/w3c/csswg-drafts/issues/2101#issue-280846560 
describes the main use-case: HTML list styling needs this, because 
type="a" and type="A" are different.


Example usage: [foo="bar" s] { color: fuschia }

-Boris
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform