subject:"Content sniffing\: seeking reliable protection of a text HTTP resource"

Re: Content sniffing: seeking reliable protection of a text HTTP resource

2015-10-09 Thread ian . melven

On Sunday, October 4, 2015 at 8:50:26 PM UTC-7, Boris Zbarsky wrote:
> On 10/1/15 5:36 PM, Incnis Mrsi wrote:
> > First is "media type (a.k.a. MIME) sniffing", when browser overrides
> > media type/subtype. This is implemented in
> > toolkit/components/mediasniffer/nsMediaSniffer.cpp component (and
> > possibly others, don't know).
> 
> Note that these are generally very conservative in their application. 
> There are very few cases in which we will override a server-provided 
> MIME type for a document load, for example (I think the RSS thing might 
> well be the only case, in fact).
> 
> > There is a proposal
> > https://bugzilla.mozilla.org/show_bug.cgi?id=471020 to make behaviour of
> > Firefox compatible with MS Internet Explorer and
> > https://mimesniff.spec.whatwg.org/#supplied-mime-type-detection-algorithm ,
> > using <> to switch the sniffing off.
> 
> Right, but that proposal needs to actually define what it means to 
> switch the sniffing off.  I don't believe the IE behavior is documented 
> anywhere, unfortunately.

Hi,

Anne has worked to create W3C tests and work towards standardizing the handling 
of nosniff in Fetch in 
https://fetch.spec.whatwg.org/#x-content-type-options-header which might 
possibly make implementation clearer ? 

cheers
ian

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Content sniffing: seeking reliable protection of a text HTTP resource

2015-10-04 Thread Boris Zbarsky


On 10/1/15 5:36 PM, Incnis Mrsi wrote:

First is “media type (a.k.a. MIME) sniffing”, when browser overrides
media type/subtype. This is implemented in
toolkit/components/mediasniffer/nsMediaSniffer.cpp component (and
possibly others, don’t know).


Note that these are generally very conservative in their application. 
There are very few cases in which we will override a server-provided 
MIME type for a document load, for example (I think the RSS thing might 
well be the only case, in fact).



There is a proposal
https://bugzilla.mozilla.org/show_bug.cgi?id=471020 to make behaviour of
Firefox compatible with MS Internet Explorer and
https://mimesniff.spec.whatwg.org/#supplied-mime-type-detection-algorithm ,
using «X-Content-Type-Options: nosniff» to switch the sniffing off.


Right, but that proposal needs to actually define what it means to 
switch the sniffing off.  I don't believe the IE behavior is documented 
anywhere, unfortunately.



Second scenario is a less known “UTF sniffing”, applicable only to text
media types. Browser respects the type proper, but overrides «charset=»
label with own guesses.


Just to be clear, which situations are we talking about here?

For HTML, the behavior is defined in 
https://html.spec.whatwg.org/#determining-the-character-encoding and 
basically says that a UTF-16 or UTF-8 BOM will override a 
transport-layer encoding declaration such as the "charset" bit in the 
Content-Type header.


For CSS, the behavior is defined in 
http://www.w3.org/TR/css3-syntax/#input-byte-stream which basically uses 
https://encoding.spec.whatwg.org/#decode which once again looks at the 
BOM and only if one is missing considers other sources of encoding 
information (like HTTP headers).


For text/plain and other types that trigger "show as text" processing in 
the browser, the relevant spec is 
https://html.spec.whatwg.org/#read-text which defers to the 
specifications for the relevant MIME type.  So arguably for text/css we 
should consider the BOM before other things, while for text/plain we 
should do what RFC 2046 defines for text/plain.  Unfortunately, what 
that RFC defines is to use the "us-ascii" encoding if there is no 
charset parameter supplied, which is not really a useful thing to do on 
the web today.  So I suspect that what we do in practice is exactly the 
same thing as for HTML.



This is implemented in
netwerk/base/nsUnicharStreamLoader.cpp


There is no sniffing I see there.  It just hands the initial bytes and 
network headers to its consumers and asks them to pick an encoding.


In practice the only consumer is CSS, which is already discussed above.


HTML5 encoding sniffing that isn’t applicable (reasonably) to
text/plain.


Why not, if I might ask?


In the case of text/plain it leads to bugs. Simple test
cases are available at http://course.irccity.ru/ya-yu-9-amp.txt (toxic
UTF-16 “BOM”)


Why is this particularly a problem for text/plain but not text/html?  If 
your BOM doesn't match your text, you will have a bad time...


Notably, opening this file in a text editor will show U+2639, because a 
BOM is what a text editor has to go on.



and http://course.irccity.ru/p-guillemet-yi-ya.txt (toxic
UTF-8 “BOM”). It poses less immediate security risk


Indeed.


but still can cause
data corruption whenever arbitatry data are allowed into (beginning of)
text/plain documents.


True.  On the other hand, if you're allowing arbitrary injection of 
untrusted content into your document you also have to worry about 
injection of U+202E (RIGHT-TO-LEFT OVERRIDE) and other fun things, no?



The toxic UTF sniffing was observer in Firefox,
MSIE, Google Chrome, and Safari


Right, because I assume they all basically did the same thing: observed 
that the spec for how encodings should be handled for text/plain is old 
and direct application of it is daft on the web today, and since they're 
sending the data through the HTML parser _anyway_ just reused the HTML 
parser's encoding codepath



Possible approaches to the toxic UTF sniffing include:
• Just fix it (certainly would cause backlash from people eager to burn
anything except UTF-8).


Well, and is also likely to break some documents that are out there with 
bogus charset parameters right now.


I assume by "just fix it" you mean "reverse the precedence of BOM and 
the charset parameter of Content-Type", not "follow RFC 2046 to the 
letter"?  But if you mean the latter, please do say so, because then we 
can just stop this conversation right now.



• Make a new Firefox preferences value   (e. g.
network.http.charset_quirk_level) controlling browser’s behaviour.


What's the point?  When would this ever be useful?


• Make patches for the source code to be used only by those who are
interested.


And of course:

• Say it's not a problem in any reasonable scenario

right?

-Boris
___
dev-platform mailing list
dev-platform@lists.mozilla.org

Content sniffing: seeking reliable protection of a text HTTP resource

2015-10-01 Thread Incnis Mrsi

Hello.

This message asks for opinions and suggestions how to make
Mozilla products understand «Content-Type» of a Web resource
exactly as specified in HTTP(s) headers.

Content sniffing in browsers is a compromise between standards
and interoperability with “poor” Web sites.
It creates vulnerabilities and, generally, breaks compatibility with (original) HTTP/1.1.
In some cases it conceals protocol data from such end user’s tools as Ctrl-I (information on page).
See http://www.superstructure.info/browser/compromised/toxic-sniffing.html
for some generally less known information about it.

I have particular concerns about two scenarios.
First is “media type (a.k.a. MIME) sniffing”,
when browser overrides media type/subtype.
This is implemented in toolkit/components/mediasniffer/nsMediaSniffer.cpp component
(and possibly others, don’t know).
There is a proposal https://bugzilla.mozilla.org/show_bug.cgi?id=471020
to make behaviour of Firefox compatible with MS Internet Explorer
and https://mimesniff.spec.whatwg.org/#supplied-mime-type-detection-algorithm ,
using «X-Content-Type-Options: nosniff» to switch the sniffing off.

Second scenario is a less known “UTF sniffing”,
applicable only to text media types. Browser respects the type proper,
but overrides «charset=» label with own guesses.
This is implemented in netwerk/base/nsUnicharStreamLoader.cpp ;
such implementation is based on HTML5 encoding sniffing
that isn’t applicable (reasonably) to text/plain.
In the case of text/plain it leads to bugs. Simple test cases are available
at http://course.irccity.ru/ya-yu-9-amp.txt (toxic UTF-16 “BOM”)
and http://course.irccity.ru/p-guillemet-yi-ya.txt (toxic UTF-8 “BOM”).
It poses less immediate security risk, but still can cause data corruption
whenever arbitatry data are allowed into (beginning of) text/plain documents.
The toxic UTF sniffing was observer in Firefox, MSIE, Google Chrome, and Safari

and doesn’t seemingly correlate with «X-Content-Type-Options» mentioned above.

Possible approaches to the toxic UTF sniffing include:
• Just fix it (certainly would cause backlash from people eager to burn
anything except UTF-8).
• Something along the lines of the no-sniff flag.
• Make a new Firefox preferences value
(e. g. network.http.charset_quirk_level) controlling browser’s behaviour.

• Make patches for the source code to be used only by those who are interested.

Possible approaches to relation between two scenarios include:
• Extend the meaning of the «X-Content-Type-Options: nosniff» to banning the
toxic UTF sniffing.
• Make interpretation of «X-Content-Type-Options» depend on preferences.
• Invent a new value for X-Content-Type-Options, or a new header at all,
in a hope other browsers and Web applications will ultimately adopt it.

• Treat two problems completely separately.

Opinions?

Please note, I’m not (yet) a browser developer and
my main agenda is making a browser I could trust myself.

Regards, Incnis Mrsi

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Content sniffing: seeking reliable protection of a text HTTP resource

Re: Content sniffing: seeking reliable protection of a text HTTP resource

Content sniffing: seeking reliable protection of a text HTTP resource

3 matches

Site Navigation

Mail list logo

Footer information