[ 
https://issues.apache.org/jira/browse/TIKA-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pdwalker updated TIKA-2608:
---------------------------
    Description: 
When the tika "detects" the following file, it returns the wrong content type:

{{$ curl -I 
[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]}}
 {{HTTP/1.1 200 OK}}
 {{Server: nginx/1.10.3 (Ubuntu)}}
 {{Date: Fri, 16 Mar 2018 10:09:54 GMT}}
 {{Content-Type: text/x-matlab}}
 {{  [snip]}}
 {{X-Frame-Options: SAMEORIGIN}}

However, the unminified version of the same file returns the correct type:

{{$ curl -I 
[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js]}}
 {{HTTP/1.1 200 OK}}
 {{Server: nginx/1.10.3 (Ubuntu)}}
 {{Date: Fri, 16 Mar 2018 10:10:25 GMT}}
 {{Content-Type: application/javascript}}
 {{  [snip]}}
 {{X-Frame-Options: SAMEORIGIN}}

The problem this causes is when my xwiki installation is behind an ssl proxy 
(nginx) and I enable the add_header X-Content-Type-Options nosniff; header.  

Modern browsers return the following error:
{quote}Refused to execute script from 
'[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js|https://wiki.proxy.domain/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]'
 because its MIME type ('text/x-matlab') is not executable, and strict MIME 
type checking is enabled.
{quote}
My "solution" is to disable the strict mime type checking in the ssl proxy, but 
I don't think that is idea.  It'd be better of the matlab parser didn't claim 
random minified js files as its own.

 

Note:

Edit: I  marked the problem as being with the matlab parser, but that may be 
incorrect - I'm not sure exactly what code actually does the detection.

 

  was:
When the tika "detects" the following file, it returns the wrong content type:

{{$ curl -I 
[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]}}
 {{HTTP/1.1 200 OK}}
 {{Server: nginx/1.10.3 (Ubuntu)}}
 {{Date: Fri, 16 Mar 2018 10:09:54 GMT}}
 {{Content-Type: text/x-matlab}}
{{  [snip]}}
{{X-Frame-Options: SAMEORIGIN}}

However, the unminified version of the same file returns the correct type:

{{$ curl -I 
[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js]}}
 {{HTTP/1.1 200 OK}}
 {{Server: nginx/1.10.3 (Ubuntu)}}
 {{Date: Fri, 16 Mar 2018 10:10:25 GMT}}
 {{Content-Type: application/javascript}}
{{  [snip]}}
{{X-Frame-Options: SAMEORIGIN}}

The problem this causes is when my xwiki installation is behind an ssl proxy 
(nginx) and I enable the add_header X-Content-Type-Options nosniff; header.  

Modern browsers return the following error:
{quote}Refused to execute script from 
'[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js|https://wiki.proxy.domain/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]'
 because its MIME type ('text/x-matlab') is not executable, and strict MIME 
type checking is enabled.
{quote}
My "solution" is to disable the strict mime type checking in the ssl proxy, but 
I don't think that is idea.  It'd be better of the matlab parser didn't claim 
random minified js files as its own.


> tika matlab parser incorrectly identifies content type of minified javascript 
> file
> ----------------------------------------------------------------------------------
>
>                 Key: TIKA-2608
>                 URL: https://issues.apache.org/jira/browse/TIKA-2608
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.17
>         Environment: * xwiki 10.1,
>  * Tomcat 8 (8.0.32-1ubuntu1)
>  * Ubuntu 16.04.4 LTS
>  * Oracle Java 1.8.0_161-b12
>            Reporter: pdwalker
>            Priority: Minor
>
> When the tika "detects" the following file, it returns the wrong content type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:09:54 GMT}}
>  {{Content-Type: text/x-matlab}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> However, the unminified version of the same file returns the correct type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:10:25 GMT}}
>  {{Content-Type: application/javascript}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> The problem this causes is when my xwiki installation is behind an ssl proxy 
> (nginx) and I enable the add_header X-Content-Type-Options nosniff; header.  
> Modern browsers return the following error:
> {quote}Refused to execute script from 
> '[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js|https://wiki.proxy.domain/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]'
>  because its MIME type ('text/x-matlab') is not executable, and strict MIME 
> type checking is enabled.
> {quote}
> My "solution" is to disable the strict mime type checking in the ssl proxy, 
> but I don't think that is idea.  It'd be better of the matlab parser didn't 
> claim random minified js files as its own.
>  
> Note:
> Edit: I  marked the problem as being with the matlab parser, but that may be 
> incorrect - I'm not sure exactly what code actually does the detection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to