On a related note, we do a rubbish job of guessing the content type from
the content of files themselves  via
URLConnection#guessContentTypeFromStream(InputStream).  I've added a bit
more logic in there for the most obvious cases, but when you consider
the info in your typical Linux 'magic' file we have a long way to go.
My first thought was whether we could ask the platform to guess for us,
but I don't think there is any equivalent on Windows etc?

Regards,
Tim

Alexey Petrenko wrote:
> Looks like both application/rtf and text/rtf are correct from IANA [1]
> point of view.
> So I do not see any harm to follow RI's behavior in this case.
> 
> By the way application/rtf specification looks more fresh then text/rtf
> 
> SY, Alexey
> 
> 1. http://www.iana.org/assignments/media-types/
> 
> 2007/8/31, Tim Ellison <[EMAIL PROTECTED]>:
>> The MIME types for a given extension are defined here [1] which we took
>> from httpd's view of the world.  So while it would be trivial to change
>> them to be the same as the RI, I'm inclined to:
>>  - leave rtf as text/rtf
>>  - add java to our list as text/plain
>>  - leave doc as application/msword
>> then figure out how to snoop the stream for other types.
>>
>> [1]
>> http://svn.apache.org/viewvc/harmony/enhanced/classlib/trunk/depends/files/content-types.properties?revision=494047&view=markup
>>
>> Thoughts?
>> Tim
>>
>>
>> Vasily Zakharov (JIRA) wrote:
>>> [classlib][luni] URLConnection.getContentType() works with files incorrectly
>>> ----------------------------------------------------------------------------
>>>
>>>                  Key: HARMONY-4699
>>>                  URL: https://issues.apache.org/jira/browse/HARMONY-4699
>>>              Project: Harmony
>>>           Issue Type: Bug
>>>           Components: Classlib
>>>             Reporter: Vasily Zakharov
>>>
>>>
>>> In Harmony implementation, java.net.URLConnection.getContentType() works 
>>> incorrectly when addresses a file URL:
>>>
>>> 1. For files with .rtf extension, RI returns "application/rtf", while 
>>> Harmony returns "text/rtf".
>>>
>>> 2. For files with .java extension, RI returns "text/plain", while Harmony 
>>> returns "content/unknown".
>>>
>>> 3. For files with .doc extension, RI returns "content/unknown", while 
>>> Harmony returns "application/msword". The same is true for other known 
>>> extensions.
>>>
>>> 4. For files with unrecognized extension and with HTML content, RI returns 
>>> "text/html", while Harmony returns "content/unknown".
>>>
>>> Items 1 and 2 look like a minor issues that would better be fixed for 
>>> compatibility with RI.
>>>
>>> Item 3 looks like a non-bug difference, as Harmony behaves clearly better 
>>> than RI in these cases.
>>>
>>> Item 4 looks like a serious bug, as RI clearly looks into file content for 
>>> the file type, and Harmony does not. Looks like 
>>> org.apache.harmony.luni.internal.net.www.protocol.file.FileURLConnection.getContentType()
>>>  needs to be fixed to use guessContentTypeFromStream() in addition to 
>>> guessContentTypeFromName().
>>>
>>> The attached archive contains the reproducer with some test files it uses. 
>>> Here's the reproducer code:
>>>
>>> public class Test {
>>>     static void printContentType(String fileName) throws 
>>> java.io.IOException {
>>>         System.out.println(fileName + ": " + new java.net.URL("file:" + 
>>> fileName).openConnection().getContentType());
>>>     }
>>>     public static void main(String argv[]) {
>>>         try {
>>>             printContentType("test.rtf");
>>>             printContentType("Test.java");
>>>             printContentType("test.doc");
>>>             printContentType("test.htx");
>>>         } catch (Exception e) {
>>>             e.printStackTrace(System.out);
>>>         }
>>>     }
>>> }
>>>
>>> Output on RI:
>>>
>>> test.rtf: application/rtf
>>> Test.java: text/plain
>>> test.doc: content/unknown
>>> test.htx: text/html
>>>
>>> Output on Harmony:
>>>
>>> test.rtf: text/rtf
>>> Test.java: content/unknown
>>> test.doc: application/msword
>>> test.htx: content/unknown
>>>
>>> This issue is a blocker for HARMONY-4696, as on RI 
>>> JEditorPane.getContentType() should be based on 
>>> URLConnection.getContentType() that now works incorrectly.
>>>
>>>
> 

Reply via email to