Re: File API: Blob.type

2013-04-05 Thread Arun Ranganathan
On Apr 5, 2013, at 3:17 PM, Alexey Proskuryakov wrote:

> 
> 03 апр. 2013 г., в 13:11, Arun Ranganathan  написал(а):
> 
>>> My only concern is that blob.type should never contain parameters.  
>>> Comparing it to "text/plain" or "image/jpeg" should work, and not 
>>> mysteriously fail a year later when somebody eventually throws a MIME type 
>>> parameter into the mix.  Today, all browsers expose text files at 
>>> text/plain.  If a browser a year from now decides to call text files with a 
>>> UTF-8 BOM "text/plain; charset=UTF-8", it'll break interop.
> 
> What specifies how a File gets its type? The only requirement I can find is 
> that "User agents must not attempt heuristic determination of type", which I 
> think implies that something like inputElement.files[0].type is always "" for 
> a file chosen by a user via .


The spec. now overreaches a bit :-( 

Not allowing heuristic mechanisms was merely to restrict encoding determination 
as per at lease one implementation's experience with it being substandard: 
https://bugzilla.mozilla.org/show_bug.cgi?id=848842

But now maybe we're going a bit far.  Should we standardize how UAs do 
auto-detect of file type, including something about extensions and some BOM 
methods?  This seems to be complicated and may be unnecessary -- most UAs do 
this just about right in the absence of a standard.


> Guessing MIME type from file name or metadata is always a heuristic, as not 
> all platforms will know that "archive.sit" means "application/x-stuffit".
> 
> At the same time, browsers do autodetect types for many files. We'll need to 
> autodetect when serializing a form for submission anyway, so exposing this 
> information a little earlier only makes sense.
> 
> I think that these concerns can be resolved by specifying what File.type is 
> more explicitly. The spec can just say that parameters are not allowed in the 
> browser chosen type.


That seems sensible!  By *not* allowing charset parameters in types determined 
by UAs, these are now set by web applications only, which may mitigate Glenn's 
concerns.

Maybe the way forward is to leave this to UAs, and:

1. Say UAs should return file type, if known.
2. UAs must not use heuristics or statistical methods to determine encoding and
3. UAs must not set the charset parameter in the returned type for text/plain.  
This will then defer to the encoding spec. and attempt fallback decoding.  
Where a web application sets a charset parameter, this will do the right thing 
for readAsText with fallback decoding.

> 
>>> Additionally, determining a blob's file type seems like the most obvious 
>>> use of this property, and making people say "if(blob.type.split(";")[0] == 
>>> 'text/plain')" is simply not a good interface.
>> 
>> 
>> OK -- you're strongly opinionated on the matter of NOT allowing a charset 
>> parameter.  I'd like to see if implementers who had an opinion on its 
>> usefulness can weigh in -- Darin?  Alexey?
> 
> 
> I do not have a very strong opinion. I like the simpler API of passing 
> parameters through the type attribute, as it's specified currently. This also 
> matches XMLHttpRequest API better. And of course, keeping existing behavior 
> means that we won't break the web.

I like it too.  We keep charset, but don't let user agents set it for 
auto-detected files; it can only be set with a Blob constructor or a slice 
call.  Blob.type is a string that can be set by developers and has normative 
requirements that are not strict tokenization requirements, so I think we're 
fine here.

-- A*

Re: File API: Blob.type

2013-04-05 Thread Alexey Proskuryakov

03 апр. 2013 г., в 13:11, Arun Ranganathan  написал(а):

>> My only concern is that blob.type should never contain parameters.  
>> Comparing it to "text/plain" or "image/jpeg" should work, and not 
>> mysteriously fail a year later when somebody eventually throws a MIME type 
>> parameter into the mix.  Today, all browsers expose text files at 
>> text/plain.  If a browser a year from now decides to call text files with a 
>> UTF-8 BOM "text/plain; charset=UTF-8", it'll break interop.

What specifies how a File gets its type? The only requirement I can find is 
that "User agents must not attempt heuristic determination of type", which I 
think implies that something like inputElement.files[0].type is always "" for a 
file chosen by a user via .

Guessing MIME type from file name or metadata is always a heuristic, as not all 
platforms will know that "archive.sit" means "application/x-stuffit".

At the same time, browsers do autodetect types for many files. We'll need to 
autodetect when serializing a form for submission anyway, so exposing this 
information a little earlier only makes sense.

I think that these concerns can be resolved by specifying what File.type is 
more explicitly. The spec can just say that parameters are not allowed in the 
browser chosen type.

>> Additionally, determining a blob's file type seems like the most obvious use 
>> of this property, and making people say "if(blob.type.split(";")[0] == 
>> 'text/plain')" is simply not a good interface.
> 
> 
> OK -- you're strongly opinionated on the matter of NOT allowing a charset 
> parameter.  I'd like to see if implementers who had an opinion on its 
> usefulness can weigh in -- Darin?  Alexey?


I do not have a very strong opinion. I like the simpler API of passing 
parameters through the type attribute, as it's specified currently. This also 
matches XMLHttpRequest API better. And of course, keeping existing behavior 
means that we won't break the web.

- WBR, Alexey Proskuryakov




Re: File API: Blob.type

2013-04-03 Thread Arun Ranganathan
On Mar 19, 2013, at 8:52 PM, Glenn Maynard wrote:

> On Tue, Mar 19, 2013 at 1:41 PM, Arun Ranganathan  wrote: 
> 
> > 2.Convert every character in relativeContentType to lower case.
> 
> I recommend referencing "Converting a string to ASCII lowercase" in HTML.  
> http://www.whatwg.org/specs/web-apps/current-work/#converted-to-ascii-lowercase


Done.

> 
> > 1.If relativeContentType contains any non-ASCII characters, then set 
> > relativeContentType to the empty string and return from these substeps.
> > 3.If relativeContentType contains any line break characters like "CR" 
> > or "LF" or any CTLs or separators, then set relativeContentType to the 
> > empty string and return from these substeps.
> 
> #3 is too vague.  I recommend combining #1 and #3, saying: "If any character 
> in relativeContentType outside of the range U+0020 to U+007E".  That's the 
> printable ASCII range, and excludes all control characters.
> 


Done (+ thanks).


> > 4.Parse relativeContentType as an RFC2616 media-type, tokenizing it 
> > according to the ABNF for media-type [RFC2616] with the ASCII "/" character 
> > separating tokens representing the type and subtype productions. If 
> > relativeContentType cannot be tokenized according to the ABNF for 
> > media-type [RFC2616], then set relativeContentType to the empty string and 
> > return from these substeps.
> 
> I'm not sure we should be this strict.  I'd lean towards keeping it simple, 
> allowing any string at all as long as it contains only lowercase, printable 
> ASCII.


Done -- we restrict it now, but don't mandate tokenization along the lines of 
RFC2616.


> You don't need to say "The following requirements are normative for this 
> parameter".  That's what the normative language that follows ("must") means.


Done.


> My only concern is that blob.type should never contain parameters.  Comparing 
> it to "text/plain" or "image/jpeg" should work, and not mysteriously fail a 
> year later when somebody eventually throws a MIME type parameter into the 
> mix.  Today, all browsers expose text files at text/plain.  If a browser a 
> year from now decides to call text files with a UTF-8 BOM "text/plain; 
> charset=UTF-8", it'll break interop.
> 
> Additionally, determining a blob's file type seems like the most obvious use 
> of this property, and making people say "if(blob.type.split(";")[0] == 
> 'text/plain')" is simply not a good interface.


OK -- you're strongly opinionated on the matter of NOT allowing a charset 
parameter.  I'd like to see if implementers who had an opinion on its 
usefulness can weigh in -- Darin?  Alexey?

http://dev.w3.org/2006/webapi/FileAPI/

-- A*

Re: File API: Blob.type

2013-03-19 Thread Glenn Maynard
On Tue, Mar 19, 2013 at 1:41 PM, Arun Ranganathan  wrote:

> Stricter rules are in place for "type" both while constructing Blob and
> for slice calls:
>
> http://dev.w3.org/2006/webapi/FileAPI/#constructorBlob
>
> and
>
> http://dev.w3.org/2006/webapi/FileAPI/#slide-method-algo
>


> 2.Convert every character in relativeContentType to lower case.

I recommend referencing "Converting a string to ASCII lowercase" in HTML.
http://www.whatwg.org/specs/web-apps/current-work/#converted-to-ascii-lowercase

> 1.If relativeContentType contains any non-ASCII characters, then set
relativeContentType to the empty string and return from these substeps.
> 3.If relativeContentType contains any line break characters like "CR"
or "LF" or any CTLs or separators, then set relativeContentType to the
empty string and return from these substeps.

#3 is too vague.  I recommend combining #1 and #3, saying: "If any
character in relativeContentType outside of the range U+0020 to U+007E".
That's the printable ASCII range, and excludes all control characters.

> 4.Parse relativeContentType as an RFC2616 media-type, tokenizing it
according to the ABNF for media-type [RFC2616] with the ASCII "/" character
separating tokens representing the type and subtype productions. If
relativeContentType cannot be tokenized according to the ABNF for
media-type [RFC2616], then set relativeContentType to the empty string and
return from these substeps.

I'm not sure we should be this strict.  I'd lean towards keeping it simple,
allowing any string at all as long as it contains only lowercase, printable
ASCII.

If we really want to be that strict, I recommend specifying what to do
directly instead of by reference to RFC2616.  It's not a very clear
specification, at least for the purposes it's being used for here.  I
recommend not using it as a normative reference at all.

You don't need to say "The following requirements are normative for this
parameter".  That's what the normative language that follows ("must") means.


> So the "type" attribute of a Blob object isn't the *literal* value of the
> header; it's the type of the Blob, expressed as a MIME type.  When
> dereferencing Blob URLs, you get this type back with the Content-Type
> header, as you do normally in HTTP scenarios.  This is a well-understood
> behavior, and I agree with points you've made about not being beholden to
> the RFC when designing an API.
>

For what it's worth, while I'm familiar with how the Content-Type header
works, this wasn't at all clear to me.  To me, a MIME type is
"type/subtype", parameters like charset are metadata included next to a
MIME type (not part of the MIME type itself), and I wouldn't hesitate at
all to say "if(blob.type == 'text/plain')".

(I think the RFC is simply vague on this point, and I'm sure other people
have different interpretations--the point is just that this is a
reasonable, intuitive view of MIME types.)

I think the question here is whether or not to include *separate
> attributes* on the Blob interface for the rarely used Charset Parameter,
> namely anything after the semicolon in MIME types of the sort:
> "text/plain;charset=UTF-8".  I've considered all your arguments by way of
> developer advocacy, and actually think we'll do developers a disservice by
> adding to the Blob interface:
>
> 1. The Charset Parameter consideration applies only to text/plain.  There
> are numerous other MIME types that don't use it: application/*, audio/*,
> image/*, video/*, etc.  Complicating the interface on the off-chance that a
> stray use of the Charset parameter breaks a direct equality comparison is
> "too much API for too little."
>
> 2. The Charset Parameter even in the context of text/plain isn't common
> enough to warrant a special case for text/plain within the API.
>
> 3. In general, it's a pretty stable assumption to conclude that developers
> will expect "type" to be surfaced later along with "Content-Type" when
> dereferencing a Blob URI.  I don't think we've made an assumption that's
> terribly galling.
>

I'm not concerned with exposing parameters; I don't think it's important,
or even necessarily useful.  I only suggested it as an alternative, if the
functionality of being able to manipulate MIME type parameters is wanted.
You're arguing that it's a rarely used special case, which is an argument
for not exposing it at all (not for leaking the special case into .type).

My only concern is that blob.type should never contain parameters.
Comparing it to "text/plain" or "image/jpeg" should work, and not
mysteriously fail a year later when somebody eventually throws a MIME type
parameter into the mix.  Today, all browsers expose text files at
text/plain.  If a browser a year from now decides to call text files with a
UTF-8 BOM "text/plain; charset=UTF-8", it'll break interop.

Additionally, determining a blob's file type seems like the most obvious
use of this property, and making people say "if(blob.type.split(";")[0] ==
'text/

Re: File API: Blob.type

2013-03-19 Thread Arun Ranganathan
Alexey,


On Mar 7, 2013, at 3:02 PM, Alexey Proskuryakov wrote:

> 
> The current File API spec seems to have a mismatch between type in 
> BlobPropertyBag, and type as Blob attribute. The latter declaratively states 
> that the type is an ASCII lower case string. As mentioned by Glenn before, 
> WebKit interpreted this by raising an exception in constructor for non-ASCII 
> input, and lowercasing the string. I think that this is a reasonable reading 
> of the spec. I'd be fine with raising exceptions for invalid types more 
> eagerly.
> 
> This is the text in question:
> 
> (1)
>> type, a DOMString which corresponds to the Blob object's type attribute. If 
>> not the empty string, user agents must treat it as an RFC2616 media-type 
>> [RFC2616], and as an opaque string that can be ignored if it is an invalid 
>> media-type. This value must be used as the Content-Type header when 
>> dereferencing a Blob URI.
>> 
> 
> 
> (2)
>> type
>> The ASCII-encoded string in lower case representing the media type of the 
>> Blob, expressed as an RFC2046 MIME type [RFC2046]. On getting, conforming 
>> user agents must return the MIME type of the Blob, if it is known. If 
>> conforming user agents cannot determine the media type of the Blob, they 
>> must return the empty string. A string is a valid MIME type if it matches 
>> the media-type token defined in section 3.7 "Media Types" of RFC 2616 
>> [RFC2616]. If not the empty string, user agents must treat it as an RFC2616 
>> media-type [RFC2616], and as an opaque string that can be ignored if it is 
>> an invalid media-type. This value must be used as the Content-Type header 
>> when dereferencing a Blob URI.


This is now clarified; the mismatch is a spec. bug.  Thanks for pointing this 
out.


> It would be helpful to have the terminology corrected, and to have this 
> generally clarified - for example, validity is mentioned here, but seems to 
> be unused.
> 


Conditions for validity have been clarified; this doesn't warrant throwing a 
SyntaxError, but it does specify when implementations should ignore poor use of 
MIME type strings, e.g. here's additional clarification in the slice call:

http://dev.w3.org/2006/webapi/FileAPI/#slide-method-algo


> It seems pretty clear from normative text that charset parameter is supposed 
> to work. A non-normative example supports that too. I agree with Arun that 
> this seems best to keep as is.

+1.


> However,  is about a 
> different case - it's about posting multipart form data that has Blob 
> elements with invalid media-types. I'm not even sure which spec is in charge 
> of this behavior - I don't think that anything anywhere says that Blob.type 
> affects media-type of posted multipart data, even though that's obviously the 
> intention. XMLHttpRequest spec defers to HTML, which defers to RFC2388, which 
> mentions files "returned via filling out a form", but not Blobs (which is no 
> surprise given its age).


In fact, I'm not sure if Blob.type should influence the type of multipart form 
data.  Consider the concatenation of several Blobs into a new Blob, as the Blob 
constructor allows.  What should the type of a newly constructed Blob be,  if 
it consists of several differently typed Blobs?  The spec. suggests 
disregarding the type of each Blob, but encourages the right use of type within 
the Blob constructor.  

I'm also not sure multipart form data falls under the aegis of the File API, 
but at least Blobs with invalid types is the same us having no type now (empty 
string).


> Making Blobs only hold valid media-types would solve practical issues, but it 
> would be helpful to know what formally defines multipart data serialization 
> with blobs.
> 
> We also previously had 
>  for sending 
> non-multipart data. Back then, we determined that "Content-Type: " should be 
> sent when the value is invalid. I'm no longer sure if that's right. For this 
> case, XMLHttpRequest authoritatively defines the behavior, although heavily 
> leaning on File API to decide when the type attribute is empty:
> 
>> If the object's type attribute is not the empty string let mime type be its 
>> value.
> 
> 
> Note that "mime type" is then directly used as default media-type for 
> Content-Type header, but it's not parsed to set encoding variable. The 
> encoding could be needed to update a charset in author provided Content-Type 
> header field in later steps of the algorithm. This is probably not right, as 
> Blob should know its encoding better than code that sets header fields on an 
> XMLHttpRequest object.
> 


Yes, but implementations can't heuristically determine a Blob's type now.  Type 
has to be specified correctly or ignored.   What "Blob should know" is now as 
good as what it is constructed to have as its type, though at read time, thanks 
to the Encoding Spec, we can determine a fallback encoding.

-- A

Re: File API: Blob.type

2013-03-19 Thread Arun Ranganathan

On Mar 7, 2013, at 7:19 PM, Glenn Maynard wrote:


> Chrome, at least, throws on new Blob([], {type: "漢字"}), as well as 
> lowercasing the string.
> 


Stricter rules are in place for "type" both while constructing Blob and for 
slice calls:

http://dev.w3.org/2006/webapi/FileAPI/#constructorBlob

and 

http://dev.w3.org/2006/webapi/FileAPI/#slide-method-algo

I agree with previous comments you've made about ByteString not solving any 
problems that Anne vK brings up; instead, I think using DOMString is probably 
ok, with tighter rules on what is valid and what should be ignored.  Throwing a 
SyntaxError might be overkill to developers and a bit too punitive; instead, I 
advocate sticking with the original spirit of the opaque string idea and 
ignoring bad use of "type."


> A couple points:
> 
> - I disagree that we should discourage comparing against Blob.type, but 
> ultimately it's such an obvious use of the property, people will do it 
> whether it's encouraged or not.  I'd never give it a second thought, since 
> that appears to be its very purpose.  Web APIs should be designed defensively 
> around how people will actually use the API, not how we wish they would.  
> Unless lots of Blob.type parameters actually include parameters, code will 
> break unexpectedly when it ends up encountering one.
> - The RFC defines a protocol ("Content-Type"), not a JavaScript API, and a 
> good protocols are rarely good APIs.  Having Blob.type be the literal value 
> of a Content-Type header isn't an elegant API.  You shouldn't need to do 
> parsing of a string value to extract "text/plain", and you shouldn't have to 
> do serialization to get "text/plain; charset=UTF-8".
> 

So the "type" attribute of a Blob object isn't the *literal* value of the 
header; it's the type of the Blob, expressed as a MIME type.  When 
dereferencing Blob URLs, you get this type back with the Content-Type header, 
as you do normally in HTTP scenarios.  This is a well-understood behavior, and 
I agree with points you've made about not being beholden to the RFC when 
designing an API.  

I think the question here is whether or not to include *separate attributes* on 
the Blob interface for the rarely used Charset Parameter, namely anything after 
the semicolon in MIME types of the sort: "text/plain;charset=UTF-8".  I've 
considered all your arguments by way of developer advocacy, and actually think 
we'll do developers a disservice by adding to the Blob interface:

1. The Charset Parameter consideration applies only to text/plain.  There are 
numerous other MIME types that don't use it: application/*, audio/*, image/*, 
video/*, etc.  Complicating the interface on the off-chance that a stray use of 
the Charset parameter breaks a direct equality comparison is "too much API for 
too little."

2. The Charset Parameter even in the context of text/plain isn't common enough 
to warrant a special case for text/plain within the API.

3. In general, it's a pretty stable assumption to conclude that developers will 
expect "type" to be surfaced later along with "Content-Type" when dereferencing 
a Blob URI.  I don't think we've made an assumption that's terribly galling.

-- A*


Re: File API: Blob.type

2013-03-08 Thread Glenn Maynard
On Fri, Mar 8, 2013 at 3:43 AM, Anne van Kesteren  wrote:

> On Thu, Mar 7, 2013 at 6:35 PM, Arun Ranganathan
>  wrote:
> > But I'm not sure about why we'd choose ByteString in lieu of being strict
> > with what characters are allowed within DOMString.  Anne, can you shed
> some
> > light on this?  And of course we should eliminate CR + LF as a
> possibility
> > at constructor invocation time, possibly by throwing.
>
> MIME/HTTP consists of byte sequences, not code points. ByteString is a
> basic JavaScript string with certain restrictions on it to match the
> byte sequence semantics, while still behaving like a string.
>

MIME types are definitely strings of codepoints.  They're just strings.  We
wouldn't make 

Re: File API: Blob.type

2013-03-08 Thread Anne van Kesteren
On Thu, Mar 7, 2013 at 6:35 PM, Arun Ranganathan
 wrote:
> But I'm not sure about why we'd choose ByteString in lieu of being strict
> with what characters are allowed within DOMString.  Anne, can you shed some
> light on this?  And of course we should eliminate CR + LF as a possibility
> at constructor invocation time, possibly by throwing.

MIME/HTTP consists of byte sequences, not code points. ByteString is a
basic JavaScript string with certain restrictions on it to match the
byte sequence semantics, while still behaving like a string.


-- 
http://annevankesteren.nl/



Re: File API: Blob.type

2013-03-07 Thread Glenn Maynard
As an aside, I'd recommend minimizing normative dependencies on RFC2046.
Like many RFCs it's an old, unclear spec.

On Thu, Mar 7, 2013 at 12:35 PM, Arun Ranganathan 
wrote:
> At some point there was a draft that specified *strict* parsing for
compliance with RFC2046, including tokenization ("/") and eliminating
non-ASCII cruft.  But we scrapped that because bugs in all major browser
projects showed that this spec. text was callously ignored.  And I didn't
want to spec. fiction, so we went with the current model for Blob.type,
which is, as Anne points out, pretty lax.

Chrome, at least, throws on new Blob([], {type: "漢字"}), as well as
lowercasing the string.

> I'm in favor of introducing stricter rules for Blob.type, and I'm also in
favor of allowing charset params; Glenn's example of  'if(blob.type ==
"text/plain")' will break, but I don't think we should be encouraging
strict equality comparisons on blob.type (and in fact, should *discourage*
it as a practice).
>
> Glenn: I think that introducing a separate interface for other parameters
actually takes away from the elegance of a simple Blob.type.  The RFC
doesn't separate them, and I'm not sure we should either.  My reading of
the RFC is that parameters *are an intrinsic part of* the MIME type.

A couple points:

- I disagree that we should discourage comparing against Blob.type, but
ultimately it's such an obvious use of the property, people will do it
whether it's encouraged or not.  I'd never give it a second thought, since
that appears to be its very purpose.  Web APIs should be designed
defensively around how people will actually use the API, not how we wish
they would.  Unless lots of Blob.type parameters actually include
parameters, code will break unexpectedly when it ends up encountering one.
- The RFC defines a protocol ("Content-Type"), not a JavaScript API, and a
good protocols are rarely good APIs.  Having Blob.type be the literal value
of a Content-Type header isn't an elegant API.  You shouldn't need to do
parsing of a string value to extract "text/plain", and you shouldn't have
to do serialization to get "text/plain; charset=UTF-8".

(My reading of RFC2046 is different, but either way I don't think the
intent of that RFC should determine the design of this API, at least on
this point.  It's a spec designed with completely different goals than a
JavaScript API.)


On Thu, Mar 7, 2013 at 2:02 PM, Alexey Proskuryakov  wrote:

> The current File API spec seems to have a mismatch between type in
> BlobPropertyBag, and type as Blob attribute. The latter declaratively
> states that the type is an ASCII lower case string. As mentioned by Glenn
> before, WebKit interpreted this by raising an exception in constructor for
> non-ASCII input, and lowercasing the string. I think that this is a
> reasonable reading of the spec. I'd be fine with raising exceptions for
> invalid types more eagerly.
>

With the file API spec as currently written, there's no normative text
saying to throw an exception, so WebKit's interpretation is incorrect, but
it's simple to fix.  In 7.1 (Constructors), add a step that says "If the
type member of the options argument is set, and contains any Unicode
codepoints less than U+0020 or greater than U+007E, throw a SyntaxError
exception and abort these steps."

(WebKit actually only throws outside of [0,0x7F].  This language throws
outside of [0x20,0x7E], excluding control characters.)

I'd suggest importing WebKit's lowercasing of .type, too, in the same place.

-- 
Glenn Maynard


Re: File API: Blob.type

2013-03-07 Thread Alexey Proskuryakov

The current File API spec seems to have a mismatch between type in 
BlobPropertyBag, and type as Blob attribute. The latter declaratively states 
that the type is an ASCII lower case string. As mentioned by Glenn before, 
WebKit interpreted this by raising an exception in constructor for non-ASCII 
input, and lowercasing the string. I think that this is a reasonable reading of 
the spec. I'd be fine with raising exceptions for invalid types more eagerly.

This is the text in question:

(1)
> type, a DOMString which corresponds to the Blob object's type attribute. If 
> not the empty string, user agents must treat it as an RFC2616 media-type 
> [RFC2616], and as an opaque string that can be ignored if it is an invalid 
> media-type. This value must be used as the Content-Type header when 
> dereferencing a Blob URI.
> 


(2)
> type
> The ASCII-encoded string in lower case representing the media type of the 
> Blob, expressed as an RFC2046 MIME type [RFC2046]. On getting, conforming 
> user agents must return the MIME type of the Blob, if it is known. If 
> conforming user agents cannot determine the media type of the Blob, they must 
> return the empty string. A string is a valid MIME type if it matches the 
> media-type token defined in section 3.7 "Media Types" of RFC 2616 [RFC2616]. 
> If not the empty string, user agents must treat it as an RFC2616 media-type 
> [RFC2616], and as an opaque string that can be ignored if it is an invalid 
> media-type. This value must be used as the Content-Type header when 
> dereferencing a Blob URI.


It would be helpful to have the terminology corrected, and to have this 
generally clarified - for example, validity is mentioned here, but seems to be 
unused.

It seems pretty clear from normative text that charset parameter is supposed to 
work. A non-normative example supports that too. I agree with Arun that this 
seems best to keep as is.

However,  is about a different 
case - it's about posting multipart form data that has Blob elements with 
invalid media-types. I'm not even sure which spec is in charge of this behavior 
- I don't think that anything anywhere says that Blob.type affects media-type 
of posted multipart data, even though that's obviously the intention. 
XMLHttpRequest spec defers to HTML, which defers to RFC2388, which mentions 
files "returned via filling out a form", but not Blobs (which is no surprise 
given its age).

Making Blobs only hold valid media-types would solve practical issues, but it 
would be helpful to know what formally defines multipart data serialization 
with blobs.

We also previously had 
 for sending 
non-multipart data. Back then, we determined that "Content-Type: " should be 
sent when the value is invalid. I'm no longer sure if that's right. For this 
case, XMLHttpRequest authoritatively defines the behavior, although heavily 
leaning on File API to decide when the type attribute is empty:

> If the object's type attribute is not the empty string let mime type be its 
> value.


Note that "mime type" is then directly used as default media-type for 
Content-Type header, but it's not parsed to set encoding variable. The encoding 
could be needed to update a charset in author provided Content-Type header 
field in later steps of the algorithm. This is probably not right, as Blob 
should know its encoding better than code that sets header fields on an 
XMLHttpRequest object.

- WBR, Alexey Proskuryakov




Re: File API: Blob.type

2013-03-07 Thread Arun Ranganathan
On Mar 6, 2013, at 7:42 PM, Glenn Maynard wrote: 

On Wed, Mar 6, 2013 at 8:29 AM, Anne van Kesteren  wrote: 
On Wed, Mar 6, 2013 at 2:21 PM, Glenn Maynard  wrote: 
> Blob.type is a MIME type, not a Content-Type header. It's a string of 
> codepoints, not a series of bytes. XHR is a protocol-level API, so maybe it 
> makes sense there, but it doesn't make sense for Blob. 

>> It's a Content-Type header value and should have those restrictions. 

>>> It's not a Content-Type header, it's a MIME type. That's part of a 
>>> Content-Type header, 
>>> but they're not the same thing. 

In fact, the intent is that the value of Blob.type is reflected in the 
Content-Type, and that setting Blob.type means that when fetching that Blob as 
a blob: you'll get the value of Blob.type in the Content-Type header. This 
model *did* allow for charset params -- it always has (perhaps not advertised, 
but it always has). 

At some point there was a draft that specified *strict* parsing for compliance 
with RFC2046, including tokenization ("/") and eliminating non-ASCII cruft. But 
we scrapped that because bugs in all major browser projects showed that this 
spec. text was callously ignored. And I didn't want to spec. fiction, so we 
went with the current model for Blob.type, which is, as Anne points out, pretty 
lax. 

>>That doesn't make sense. Blob.type isn't a string of bytes, it's a string of 
>>Unicode codepoints that happens 
>> to be restricted to the ASCII range. Applying WebKit's validity checks 
>> (with the addition of disallowing nonprintable characters) will make it have 
>> the restrictions you want; 
>> ByteString has nothing to do with this. 

I'm in favor of introducing stricter rules for Blob.type, and I'm also in favor 
of allowing charset params; Glenn's example of 'if(blob.type == "text/plain")' 
will break, but I don't think we should be encouraging strict equality 
comparisons on blob.type (and in fact, should *discourage* it as a practice). 

But I'm not sure about why we'd choose ByteString in lieu of being strict with 
what characters are allowed within DOMString. Anne, can you shed some light on 
this? And of course we should eliminate CR + LF as a possibility at constructor 
invocation time, possibly by throwing. 

Glenn: I think that introducing a separate interface for other parameters 
actually takes away from the elegance of a simple Blob.type. The RFC doesn't 
separate them, and I'm not sure we should either. My reading of the RFC is that 
parameters *are an intrinsic part of* the MIME type. 

-- A* 

- Original Message -

> On Wed, Mar 6, 2013 at 8:29 AM, Anne van Kesteren < ann...@annevk.nl
> > wrote:

> > On Wed, Mar 6, 2013 at 2:21 PM, Glenn Maynard < gl...@zewt.org >
> > wrote:
> 
> > > Blob.type is a MIME type, not a Content-Type header. It's a
> > > string
> > > of
> 
> > > codepoints, not a series of bytes. XHR is a protocol-level API,
> > > so
> > > maybe it
> 
> > > makes sense there, but it doesn't make sense for Blob.
> 

> > It's a Content-Type header value and should have those
> > restrictions.
> 

> It's not a Content-Type header, it's a MIME type. That's part of a
> Content-Type header, but they're not the same thing.

> But String vs. ByteString has nothing to do with the restrictions
> applied to it.

> > Making it a ByteString plus additional restrictions will make it do
> > as
> 
> > required.
> 

> That doesn't make sense. Blob.type isn't a string of bytes, it's a
> string of Unicode codepoints that happens to be restricted to the
> ASCII range. Applying WebKit's validity checks (with the addition of
> disallowing nonprintable characters) will make it have the
> restrictions you want; ByteString has nothing to do with this.

> On Wed, Mar 6, 2013 at 11:47 AM, Darin Fisher < da...@chromium.org >
> wrote:

> > So the intent is to allow specifying attributes like "charset"?
> > That
> > sounds useful.
> 
> I don't think so. This isn't very well-defined by RFC2046 (it seems
> vague about the relationship of parameters to MIME types), but I'm
> pretty sure Blob.type is meant to be only a MIME type, not a MIME
> type plus content-type parameters. Also, it would lead to a poor
> API: you could no longer simply say 'if(blob.type == "text/plain")';
> you'd have to parse it out yourself (which I expect nobody is
> actually doing).

> Other parameters should have a separate interface, eg.
> blob.typeParameters.charset = "UTF-8", if we want that.

> --
> Glenn Maynard


Re: File API: Blob.type

2013-03-06 Thread Glenn Maynard
On Wed, Mar 6, 2013 at 8:29 AM, Anne van Kesteren  wrote:

> On Wed, Mar 6, 2013 at 2:21 PM, Glenn Maynard  wrote:
> > Blob.type is a MIME type, not a Content-Type header.  It's a string of
> > codepoints, not a series of bytes.  XHR is a protocol-level API, so
> maybe it
> > makes sense there, but it doesn't make sense for Blob.
>
> It's a Content-Type header value and should have those restrictions.
>

It's not a Content-Type header, it's a MIME type.  That's part of a
Content-Type header, but they're not the same thing.

But String vs. ByteString has nothing to do with the restrictions applied
to it.

Making it a ByteString plus additional restrictions will make it do as
> required.
>

That doesn't make sense.  Blob.type isn't a string of bytes, it's a string
of Unicode codepoints that happens to be restricted to the ASCII range.
Applying WebKit's validity checks (with the addition of disallowing
nonprintable characters) will make it have the restrictions you want;
ByteString has nothing to do with this.


On Wed, Mar 6, 2013 at 11:47 AM, Darin Fisher  wrote:

> So the intent is to allow specifying attributes like "charset"?  That
> sounds useful.
>

I don't think so.  This isn't very well-defined by RFC2046 (it seems vague
about the relationship of parameters to MIME types), but I'm pretty sure
Blob.type is meant to be only a MIME type, not a MIME type plus
content-type parameters.  Also, it would lead to a poor API: you could no
longer simply say 'if(blob.type == "text/plain")'; you'd have to parse it
out yourself (which I expect nobody is actually doing).

Other parameters should have a separate interface, eg.
blob.typeParameters.charset = "UTF-8", if we want that.

-- 
Glenn Maynard


Re: File API: Blob.type

2013-03-06 Thread Anne van Kesteren
On Wed, Mar 6, 2013 at 5:47 PM, Darin Fisher  wrote:
> So the intent is to allow specifying attributes like "charset"?  That sounds
> useful.

Yeah I thought so. The value would be feeded straight there when
reading as if it was an HTTP response. Arun would know for sure
though.


-- 
http://annevankesteren.nl/



Re: File API: Blob.type

2013-03-06 Thread Darin Fisher
On Wed, Mar 6, 2013 at 6:29 AM, Anne van Kesteren  wrote:

> On Wed, Mar 6, 2013 at 2:21 PM, Glenn Maynard  wrote:
> > Blob.type is a MIME type, not a Content-Type header.  It's a string of
> > codepoints, not a series of bytes.  XHR is a protocol-level API, so
> maybe it
> > makes sense there, but it doesn't make sense for Blob.
>
> It's a Content-Type header value and should have those restrictions.
> Making it a ByteString plus additional restrictions will make it do as

required.
>
>
So the intent is to allow specifying attributes like "charset"?  That
sounds useful.

-Darin



>
> --
> http://annevankesteren.nl/
>
>


Re: File API: Blob.type

2013-03-06 Thread Anne van Kesteren
On Wed, Mar 6, 2013 at 2:21 PM, Glenn Maynard  wrote:
> Blob.type is a MIME type, not a Content-Type header.  It's a string of
> codepoints, not a series of bytes.  XHR is a protocol-level API, so maybe it
> makes sense there, but it doesn't make sense for Blob.

It's a Content-Type header value and should have those restrictions.
Making it a ByteString plus additional restrictions will make it do as
required.


-- 
http://annevankesteren.nl/



Re: File API: Blob.type

2013-03-06 Thread Glenn Maynard
On Wed, Mar 6, 2013 at 3:22 AM, Anne van Kesteren  wrote:

> Okay, so given https://bugs.webkit.org/show_bug.cgi?id=111380 I think
> we should put at least minimal restrictions on Blob's constructor
> concerning Blob.type. We made it "anything goes" because in theory
> with Content-Type anything goes. But of course that is false and we
> should have noticed that at the time. Content-Type's value cannot
> contain CRLF, Content-Type's value is also a byte sequence, not a code
> point sequence, and certainly not a code unit sequence.
>
> 1. I think we should change the type from DOMString to ByteString,
> just like XMLHttpRequest has it.
>

Blob.type is a MIME type, not a Content-Type header.  It's a string of
codepoints, not a series of bytes.  XHR is a protocol-level API, so maybe
it makes sense there, but it doesn't make sense for Blob.

WebKit already throws SyntaxError for codepoints outside the ASCII range,
eg. new Blob([], {type: "漢字"}).  This should just be extended to throw for
anything that isn't printable ASCII, which would include CR, LF, and other
control characters (especially nil).  In other words, anything not in the
Unicode range [U+0020,U+007E].

It doesn't look like WebKit's exception is in the spec.  I think this
should be added.

Also, WebKit lowercases the type parameter.  This doesn't seem to be in the
spec.  (http://dev.w3.org/2006/webapi/FileAPI/#dfn-type says "ASCII-encoded
string in lower case", but that's non-normative.)  I think it should be.

-- 
Glenn Maynard


File API: Blob.type

2013-03-06 Thread Anne van Kesteren
Okay, so given https://bugs.webkit.org/show_bug.cgi?id=111380 I think
we should put at least minimal restrictions on Blob's constructor
concerning Blob.type. We made it "anything goes" because in theory
with Content-Type anything goes. But of course that is false and we
should have noticed that at the time. Content-Type's value cannot
contain CRLF, Content-Type's value is also a byte sequence, not a code
point sequence, and certainly not a code unit sequence.

1. I think we should change the type from DOMString to ByteString,
just like XMLHttpRequest has it.

2. I think we should either throw or ignore setting it to values that
contain CR or LF.

3. Anything I'm missing?


-- 
http://annevankesteren.nl/