On Apr 5, 2013, at 3:17 PM, Alexey Proskuryakov wrote:

> 
> 03 апр. 2013 г., в 13:11, Arun Ranganathan <a...@mozilla.com> написал(а):
> 
>>> My only concern is that blob.type should never contain parameters.  
>>> Comparing it to "text/plain" or "image/jpeg" should work, and not 
>>> mysteriously fail a year later when somebody eventually throws a MIME type 
>>> parameter into the mix.  Today, all browsers expose text files at 
>>> text/plain.  If a browser a year from now decides to call text files with a 
>>> UTF-8 BOM "text/plain; charset=UTF-8", it'll break interop.
> 
> What specifies how a File gets its type? The only requirement I can find is 
> that "User agents must not attempt heuristic determination of type", which I 
> think implies that something like inputElement.files[0].type is always "" for 
> a file chosen by a user via <input type=file>.


The spec. now overreaches a bit :-( 

Not allowing heuristic mechanisms was merely to restrict encoding determination 
as per at lease one implementation's experience with it being substandard: 
https://bugzilla.mozilla.org/show_bug.cgi?id=848842

But now maybe we're going a bit far.  Should we standardize how UAs do 
auto-detect of file type, including something about extensions and some BOM 
methods?  This seems to be complicated and may be unnecessary -- most UAs do 
this just about right in the absence of a standard.


> Guessing MIME type from file name or metadata is always a heuristic, as not 
> all platforms will know that "archive.sit" means "application/x-stuffit".
> 
> At the same time, browsers do autodetect types for many files. We'll need to 
> autodetect when serializing a form for submission anyway, so exposing this 
> information a little earlier only makes sense.
> 
> I think that these concerns can be resolved by specifying what File.type is 
> more explicitly. The spec can just say that parameters are not allowed in the 
> browser chosen type.


That seems sensible!  By *not* allowing charset parameters in types determined 
by UAs, these are now set by web applications only, which may mitigate Glenn's 
concerns.

Maybe the way forward is to leave this to UAs, and:

1. Say UAs should return file type, if known.
2. UAs must not use heuristics or statistical methods to determine encoding and
3. UAs must not set the charset parameter in the returned type for text/plain.  
This will then defer to the encoding spec. and attempt fallback decoding.  
Where a web application sets a charset parameter, this will do the right thing 
for readAsText with fallback decoding.

> 
>>> Additionally, determining a blob's file type seems like the most obvious 
>>> use of this property, and making people say "if(blob.type.split(";")[0] == 
>>> 'text/plain')" is simply not a good interface.
>> 
>> 
>> OK -- you're strongly opinionated on the matter of NOT allowing a charset 
>> parameter.  I'd like to see if implementers who had an opinion on its 
>> usefulness can weigh in -- Darin?  Alexey?
> 
> 
> I do not have a very strong opinion. I like the simpler API of passing 
> parameters through the type attribute, as it's specified currently. This also 
> matches XMLHttpRequest API better. And of course, keeping existing behavior 
> means that we won't break the web.

I like it too.  We keep charset, but don't let user agents set it for 
auto-detected files; it can only be set with a Blob constructor or a slice 
call.  Blob.type is a string that can be set by developers and has normative 
requirements that are not strict tokenization requirements, so I think we're 
fine here.

-- A*

Reply via email to