[ https://issues.apache.org/jira/browse/TIKA-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
pdwalker updated TIKA-2608: --------------------------- Description: When the tika "detects" the following file, it returns the wrong content type: {{$ curl -I [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]}} {{HTTP/1.1 200 OK}} {{Server: nginx/1.10.3 (Ubuntu)}} {{Date: Fri, 16 Mar 2018 10:09:54 GMT}} {{Content-Type: text/x-matlab}} {{ [snip]}} {{X-Frame-Options: SAMEORIGIN}} However, the unminified version of the same file returns the correct type: {{$ curl -I [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js]}} {{HTTP/1.1 200 OK}} {{Server: nginx/1.10.3 (Ubuntu)}} {{Date: Fri, 16 Mar 2018 10:10:25 GMT}} {{Content-Type: application/javascript}} {{ [snip]}} {{X-Frame-Options: SAMEORIGIN}} The problem this causes is when my xwiki installation is behind an ssl proxy (nginx) and I enable the add_header X-Content-Type-Options nosniff; header. Modern browsers return the following error: {quote}Refused to execute script from '[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js|https://wiki.proxy.domain/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]' because its MIME type ('text/x-matlab') is not executable, and strict MIME type checking is enabled. {quote} My "solution" is to disable the strict mime type checking in the ssl proxy, but I don't think that is idea. It'd be better of the matlab parser didn't claim random minified js files as its own. Note: Edit: I marked the problem as being with the matlab parser, but that may be incorrect - I'm not sure exactly what code actually does the detection. was: When the tika "detects" the following file, it returns the wrong content type: {{$ curl -I [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]}} {{HTTP/1.1 200 OK}} {{Server: nginx/1.10.3 (Ubuntu)}} {{Date: Fri, 16 Mar 2018 10:09:54 GMT}} {{Content-Type: text/x-matlab}} {{ [snip]}} {{X-Frame-Options: SAMEORIGIN}} However, the unminified version of the same file returns the correct type: {{$ curl -I [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js]}} {{HTTP/1.1 200 OK}} {{Server: nginx/1.10.3 (Ubuntu)}} {{Date: Fri, 16 Mar 2018 10:10:25 GMT}} {{Content-Type: application/javascript}} {{ [snip]}} {{X-Frame-Options: SAMEORIGIN}} The problem this causes is when my xwiki installation is behind an ssl proxy (nginx) and I enable the add_header X-Content-Type-Options nosniff; header. Modern browsers return the following error: {quote}Refused to execute script from '[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js|https://wiki.proxy.domain/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]' because its MIME type ('text/x-matlab') is not executable, and strict MIME type checking is enabled. {quote} My "solution" is to disable the strict mime type checking in the ssl proxy, but I don't think that is idea. It'd be better of the matlab parser didn't claim random minified js files as its own. > tika matlab parser incorrectly identifies content type of minified javascript > file > ---------------------------------------------------------------------------------- > > Key: TIKA-2608 > URL: https://issues.apache.org/jira/browse/TIKA-2608 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.17 > Environment: * xwiki 10.1, > * Tomcat 8 (8.0.32-1ubuntu1) > * Ubuntu 16.04.4 LTS > * Oracle Java 1.8.0_161-b12 > Reporter: pdwalker > Priority: Minor > > When the tika "detects" the following file, it returns the wrong content type: > {{$ curl -I > [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]}} > {{HTTP/1.1 200 OK}} > {{Server: nginx/1.10.3 (Ubuntu)}} > {{Date: Fri, 16 Mar 2018 10:09:54 GMT}} > {{Content-Type: text/x-matlab}} > {{ [snip]}} > {{X-Frame-Options: SAMEORIGIN}} > However, the unminified version of the same file returns the correct type: > {{$ curl -I > [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js]}} > {{HTTP/1.1 200 OK}} > {{Server: nginx/1.10.3 (Ubuntu)}} > {{Date: Fri, 16 Mar 2018 10:10:25 GMT}} > {{Content-Type: application/javascript}} > {{ [snip]}} > {{X-Frame-Options: SAMEORIGIN}} > The problem this causes is when my xwiki installation is behind an ssl proxy > (nginx) and I enable the add_header X-Content-Type-Options nosniff; header. > Modern browsers return the following error: > {quote}Refused to execute script from > '[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js|https://wiki.proxy.domain/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]' > because its MIME type ('text/x-matlab') is not executable, and strict MIME > type checking is enabled. > {quote} > My "solution" is to disable the strict mime type checking in the ssl proxy, > but I don't think that is idea. It'd be better of the matlab parser didn't > claim random minified js files as its own. > > Note: > Edit: I marked the problem as being with the matlab parser, but that may be > incorrect - I'm not sure exactly what code actually does the detection. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)