[jira] [Closed] (TIKA-2608) tika matlab parser incorrectly identifies content type of minified javascript file

2018-03-19 Thread pdwalker (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pdwalker closed TIKA-2608.
--

appropriate fix already in

> tika matlab parser incorrectly identifies content type of minified javascript 
> file
> --
>
> Key: TIKA-2608
> URL: https://issues.apache.org/jira/browse/TIKA-2608
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.17, 1.18
> Environment: * xwiki 10.1,
>  * Tomcat 8 (8.0.32-1ubuntu1)
>  * Ubuntu 16.04.4 LTS
>  * Oracle Java 1.8.0_161-b12
>Reporter: pdwalker
>Priority: Minor
> Fix For: 2.0.0
>
>
> When the tika "detects" the following file, it returns the wrong content type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:09:54 GMT}}
>  {{Content-Type: text/x-matlab}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> However, the unminified version of the same file returns the correct type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:10:25 GMT}}
>  {{Content-Type: application/javascript}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> The problem this causes is when my xwiki installation is behind an ssl proxy 
> (nginx) and I enable the add_header X-Content-Type-Options nosniff; header.  
> Modern browsers return the following error:
> {quote}Refused to execute script from 
> '[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js|https://wiki.proxy.domain/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]'
>  because its MIME type ('text/x-matlab') is not executable, and strict MIME 
> type checking is enabled.
> {quote}
> My "solution" is to disable the strict mime type checking in the ssl proxy, 
> but I don't think that is idea.  It'd be better of the matlab parser didn't 
> claim random minified js files as its own.
>  
> Note:
> Edit: I  marked the problem as being with the matlab parser, but that may be 
> incorrect - I'm not sure exactly what code actually does the detection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2608) tika matlab parser incorrectly identifies content type of minified javascript file

2018-03-19 Thread pdwalker (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405875#comment-16405875
 ] 

pdwalker commented on TIKA-2608:


works for me.  Thanks!

> tika matlab parser incorrectly identifies content type of minified javascript 
> file
> --
>
> Key: TIKA-2608
> URL: https://issues.apache.org/jira/browse/TIKA-2608
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.17, 1.18
> Environment: * xwiki 10.1,
>  * Tomcat 8 (8.0.32-1ubuntu1)
>  * Ubuntu 16.04.4 LTS
>  * Oracle Java 1.8.0_161-b12
>Reporter: pdwalker
>Priority: Minor
> Fix For: 2.0.0
>
>
> When the tika "detects" the following file, it returns the wrong content type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:09:54 GMT}}
>  {{Content-Type: text/x-matlab}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> However, the unminified version of the same file returns the correct type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:10:25 GMT}}
>  {{Content-Type: application/javascript}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> The problem this causes is when my xwiki installation is behind an ssl proxy 
> (nginx) and I enable the add_header X-Content-Type-Options nosniff; header.  
> Modern browsers return the following error:
> {quote}Refused to execute script from 
> '[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js|https://wiki.proxy.domain/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]'
>  because its MIME type ('text/x-matlab') is not executable, and strict MIME 
> type checking is enabled.
> {quote}
> My "solution" is to disable the strict mime type checking in the ssl proxy, 
> but I don't think that is idea.  It'd be better of the matlab parser didn't 
> claim random minified js files as its own.
>  
> Note:
> Edit: I  marked the problem as being with the matlab parser, but that may be 
> incorrect - I'm not sure exactly what code actually does the detection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TIKA-2608) tika matlab parser incorrectly identifies content type of minified javascript file

2018-03-18 Thread pdwalker (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pdwalker updated TIKA-2608:
---
Affects Version/s: 1.18

> tika matlab parser incorrectly identifies content type of minified javascript 
> file
> --
>
> Key: TIKA-2608
> URL: https://issues.apache.org/jira/browse/TIKA-2608
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.17, 1.18
> Environment: * xwiki 10.1,
>  * Tomcat 8 (8.0.32-1ubuntu1)
>  * Ubuntu 16.04.4 LTS
>  * Oracle Java 1.8.0_161-b12
>Reporter: pdwalker
>Priority: Minor
> Fix For: 2.0.0
>
>
> When the tika "detects" the following file, it returns the wrong content type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:09:54 GMT}}
>  {{Content-Type: text/x-matlab}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> However, the unminified version of the same file returns the correct type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:10:25 GMT}}
>  {{Content-Type: application/javascript}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> The problem this causes is when my xwiki installation is behind an ssl proxy 
> (nginx) and I enable the add_header X-Content-Type-Options nosniff; header.  
> Modern browsers return the following error:
> {quote}Refused to execute script from 
> '[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js|https://wiki.proxy.domain/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]'
>  because its MIME type ('text/x-matlab') is not executable, and strict MIME 
> type checking is enabled.
> {quote}
> My "solution" is to disable the strict mime type checking in the ssl proxy, 
> but I don't think that is idea.  It'd be better of the matlab parser didn't 
> claim random minified js files as its own.
>  
> Note:
> Edit: I  marked the problem as being with the matlab parser, but that may be 
> incorrect - I'm not sure exactly what code actually does the detection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2608) tika matlab parser incorrectly identifies content type of minified javascript file

2018-03-18 Thread pdwalker (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404358#comment-16404358
 ] 

pdwalker commented on TIKA-2608:


Is there any chance that this fix in the 2.0.0 branch could be backported into 
the 1.1x branch?

> tika matlab parser incorrectly identifies content type of minified javascript 
> file
> --
>
> Key: TIKA-2608
> URL: https://issues.apache.org/jira/browse/TIKA-2608
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.17
> Environment: * xwiki 10.1,
>  * Tomcat 8 (8.0.32-1ubuntu1)
>  * Ubuntu 16.04.4 LTS
>  * Oracle Java 1.8.0_161-b12
>Reporter: pdwalker
>Priority: Minor
> Fix For: 2.0.0
>
>
> When the tika "detects" the following file, it returns the wrong content type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:09:54 GMT}}
>  {{Content-Type: text/x-matlab}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> However, the unminified version of the same file returns the correct type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:10:25 GMT}}
>  {{Content-Type: application/javascript}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> The problem this causes is when my xwiki installation is behind an ssl proxy 
> (nginx) and I enable the add_header X-Content-Type-Options nosniff; header.  
> Modern browsers return the following error:
> {quote}Refused to execute script from 
> '[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js|https://wiki.proxy.domain/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]'
>  because its MIME type ('text/x-matlab') is not executable, and strict MIME 
> type checking is enabled.
> {quote}
> My "solution" is to disable the strict mime type checking in the ssl proxy, 
> but I don't think that is idea.  It'd be better of the matlab parser didn't 
> claim random minified js files as its own.
>  
> Note:
> Edit: I  marked the problem as being with the matlab parser, but that may be 
> incorrect - I'm not sure exactly what code actually does the detection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2608) tika matlab parser incorrectly identifies content type of minified javascript file

2018-03-18 Thread pdwalker (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404356#comment-16404356
 ] 

pdwalker commented on TIKA-2608:


1.18 snapshot: {color:#FF}failure{color}

{{*$ java -jar tika-app-1.18-SNAPSHOT.jar -d mxGraphEditor.min.js*}}
{{Mar 19, 2018 12:11:48 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem}}
{{WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored}}
{{See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io}}
{{for optional dependencies.}}
{{J2KImageReader not loaded. JPEG2000 files will not be processed.}}
{{See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io}}
{{for optional dependencies.}}{{Mar 19, 2018 12:11:48 PM 
org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem}}
{{WARNING: org.xerial's sqlite-jdbc is not loaded.}}
{{Please provide the jar on your classpath to parse sqlite files.}}
{{See tika-parsers/pom.xml for the correct version.}}
{{*text/x-matlab*}}

> tika matlab parser incorrectly identifies content type of minified javascript 
> file
> --
>
> Key: TIKA-2608
> URL: https://issues.apache.org/jira/browse/TIKA-2608
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.17
> Environment: * xwiki 10.1,
>  * Tomcat 8 (8.0.32-1ubuntu1)
>  * Ubuntu 16.04.4 LTS
>  * Oracle Java 1.8.0_161-b12
>Reporter: pdwalker
>Priority: Minor
> Fix For: 2.0.0
>
>
> When the tika "detects" the following file, it returns the wrong content type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:09:54 GMT}}
>  {{Content-Type: text/x-matlab}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> However, the unminified version of the same file returns the correct type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:10:25 GMT}}
>  {{Content-Type: application/javascript}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> The problem this causes is when my xwiki installation is behind an ssl proxy 
> (nginx) and I enable the add_header X-Content-Type-Options nosniff; header.  
> Modern browsers return the following error:
> {quote}Refused to execute script from 
> '[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js|https://wiki.proxy.domain/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]'
>  because its MIME type ('text/x-matlab') is not executable, and strict MIME 
> type checking is enabled.
> {quote}
> My "solution" is to disable the strict mime type checking in the ssl proxy, 
> but I don't think that is idea.  It'd be better of the matlab parser didn't 
> claim random minified js files as its own.
>  
> Note:
> Edit: I  marked the problem as being with the matlab parser, but that may be 
> incorrect - I'm not sure exactly what code actually does the detection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2608) tika matlab parser incorrectly identifies content type of minified javascript file

2018-03-18 Thread pdwalker (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404353#comment-16404353
 ] 

pdwalker commented on TIKA-2608:


Ah, I found the 1.18 branch.  Checking against that version now.

> tika matlab parser incorrectly identifies content type of minified javascript 
> file
> --
>
> Key: TIKA-2608
> URL: https://issues.apache.org/jira/browse/TIKA-2608
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.17
> Environment: * xwiki 10.1,
>  * Tomcat 8 (8.0.32-1ubuntu1)
>  * Ubuntu 16.04.4 LTS
>  * Oracle Java 1.8.0_161-b12
>Reporter: pdwalker
>Priority: Minor
> Fix For: 2.0.0
>
>
> When the tika "detects" the following file, it returns the wrong content type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:09:54 GMT}}
>  {{Content-Type: text/x-matlab}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> However, the unminified version of the same file returns the correct type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:10:25 GMT}}
>  {{Content-Type: application/javascript}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> The problem this causes is when my xwiki installation is behind an ssl proxy 
> (nginx) and I enable the add_header X-Content-Type-Options nosniff; header.  
> Modern browsers return the following error:
> {quote}Refused to execute script from 
> '[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js|https://wiki.proxy.domain/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]'
>  because its MIME type ('text/x-matlab') is not executable, and strict MIME 
> type checking is enabled.
> {quote}
> My "solution" is to disable the strict mime type checking in the ssl proxy, 
> but I don't think that is idea.  It'd be better of the matlab parser didn't 
> claim random minified js files as its own.
>  
> Note:
> Edit: I  marked the problem as being with the matlab parser, but that may be 
> incorrect - I'm not sure exactly what code actually does the detection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (TIKA-2608) tika matlab parser incorrectly identifies content type of minified javascript file

2018-03-18 Thread pdwalker (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pdwalker resolved TIKA-2608.

   Resolution: Fixed
Fix Version/s: 2.0.0

problem appears resolved in the 2.0.0-SNAPSHOT build from 2018-03-19.

> tika matlab parser incorrectly identifies content type of minified javascript 
> file
> --
>
> Key: TIKA-2608
> URL: https://issues.apache.org/jira/browse/TIKA-2608
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.17
> Environment: * xwiki 10.1,
>  * Tomcat 8 (8.0.32-1ubuntu1)
>  * Ubuntu 16.04.4 LTS
>  * Oracle Java 1.8.0_161-b12
>Reporter: pdwalker
>Priority: Minor
> Fix For: 2.0.0
>
>
> When the tika "detects" the following file, it returns the wrong content type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:09:54 GMT}}
>  {{Content-Type: text/x-matlab}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> However, the unminified version of the same file returns the correct type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:10:25 GMT}}
>  {{Content-Type: application/javascript}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> The problem this causes is when my xwiki installation is behind an ssl proxy 
> (nginx) and I enable the add_header X-Content-Type-Options nosniff; header.  
> Modern browsers return the following error:
> {quote}Refused to execute script from 
> '[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js|https://wiki.proxy.domain/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]'
>  because its MIME type ('text/x-matlab') is not executable, and strict MIME 
> type checking is enabled.
> {quote}
> My "solution" is to disable the strict mime type checking in the ssl proxy, 
> but I don't think that is idea.  It'd be better of the matlab parser didn't 
> claim random minified js files as its own.
>  
> Note:
> Edit: I  marked the problem as being with the matlab parser, but that may be 
> incorrect - I'm not sure exactly what code actually does the detection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2608) tika matlab parser incorrectly identifies content type of minified javascript file

2018-03-18 Thread pdwalker (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404348#comment-16404348
 ] 

pdwalker commented on TIKA-2608:


I couldn't find any convenient nightly builds, so I checked out the repository 
and did an mvn clean install.  

I found a tika-app-2.0.0-SNAPSHOT.jar in the tiki-app/target directory.

Results:

{{*$ java -jar tika-app-2.0.0-SNAPSHOT.jar -d mxGraphEditor.min.js*}}
{{Mar 19, 2018 11:47:07 AM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem}}
{{WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored}}
{{See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io}}
{{for optional dependencies.}}
{{J2KImageReader not loaded. JPEG2000 files will not be processed.}}
{{See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io}}
{{for optional dependencies.}}{{Mar 19, 2018 11:47:07 AM 
org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem}}
{{WARNING: org.xerial's sqlite-jdbc is not loaded.}}
{{Please provide the jar on your classpath to parse sqlite files.}}
{{See tika-parsers/pom.xml for the correct version.}}
{{*application/javascript*}}

So, I'd say the problem has been resolved somewhere since 1.17.

Also, I wasn't able to find a 1.18 tag (or something similar) in the code 
repository, so I wasn't able to test that specific version.

Thanks.

> tika matlab parser incorrectly identifies content type of minified javascript 
> file
> --
>
> Key: TIKA-2608
> URL: https://issues.apache.org/jira/browse/TIKA-2608
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.17
> Environment: * xwiki 10.1,
>  * Tomcat 8 (8.0.32-1ubuntu1)
>  * Ubuntu 16.04.4 LTS
>  * Oracle Java 1.8.0_161-b12
>Reporter: pdwalker
>Priority: Minor
>
> When the tika "detects" the following file, it returns the wrong content type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:09:54 GMT}}
>  {{Content-Type: text/x-matlab}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> However, the unminified version of the same file returns the correct type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:10:25 GMT}}
>  {{Content-Type: application/javascript}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> The problem this causes is when my xwiki installation is behind an ssl proxy 
> (nginx) and I enable the add_header X-Content-Type-Options nosniff; header.  
> Modern browsers return the following error:
> {quote}Refused to execute script from 
> '[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js|https://wiki.proxy.domain/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]'
>  because its MIME type ('text/x-matlab') is not executable, and strict MIME 
> type checking is enabled.
> {quote}
> My "solution" is to disable the strict mime type checking in the ssl proxy, 
> but I don't think that is idea.  It'd be better of the matlab parser didn't 
> claim random minified js files as its own.
>  
> Note:
> Edit: I  marked the problem as being with the matlab parser, but that may be 
> incorrect - I'm not sure exactly what code actually does the detection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2608) tika matlab parser incorrectly identifies content type of minified javascript file

2018-03-18 Thread pdwalker (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404321#comment-16404321
 ] 

pdwalker commented on TIKA-2608:


I will certainly try that.  I'm looking for where the nightly builds are stored 
now.  I will report back with my findings.

> tika matlab parser incorrectly identifies content type of minified javascript 
> file
> --
>
> Key: TIKA-2608
> URL: https://issues.apache.org/jira/browse/TIKA-2608
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.17
> Environment: * xwiki 10.1,
>  * Tomcat 8 (8.0.32-1ubuntu1)
>  * Ubuntu 16.04.4 LTS
>  * Oracle Java 1.8.0_161-b12
>Reporter: pdwalker
>Priority: Minor
>
> When the tika "detects" the following file, it returns the wrong content type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:09:54 GMT}}
>  {{Content-Type: text/x-matlab}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> However, the unminified version of the same file returns the correct type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:10:25 GMT}}
>  {{Content-Type: application/javascript}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> The problem this causes is when my xwiki installation is behind an ssl proxy 
> (nginx) and I enable the add_header X-Content-Type-Options nosniff; header.  
> Modern browsers return the following error:
> {quote}Refused to execute script from 
> '[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js|https://wiki.proxy.domain/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]'
>  because its MIME type ('text/x-matlab') is not executable, and strict MIME 
> type checking is enabled.
> {quote}
> My "solution" is to disable the strict mime type checking in the ssl proxy, 
> but I don't think that is idea.  It'd be better of the matlab parser didn't 
> claim random minified js files as its own.
>  
> Note:
> Edit: I  marked the problem as being with the matlab parser, but that may be 
> incorrect - I'm not sure exactly what code actually does the detection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2608) tika matlab parser incorrectly identifies content type of minified javascript file

2018-03-16 Thread pdwalker (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401740#comment-16401740
 ] 

pdwalker commented on TIKA-2608:


running using the tiki-app-1.17.jar on the files results in the following:

 

{{*$ java -jar tika-app-1.17.jar -d mxGraphEditor.js*}}
{{Mar 16, 2018 6:57:53 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem}}
{{WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored}}
{{See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io}}
{{for optional dependencies.}}
{{TIFFImageWriter not loaded. tiff files will not be processed}}
{{See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io}}
{{for optional dependencies.}}
{{J2KImageReader not loaded. JPEG2000 files will not be processed.}}
{{See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io}}
{{for optional dependencies.}}{{Mar 16, 2018 6:57:54 PM 
org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem}}
{{WARNING: org.xerial's sqlite-jdbc is not loaded.}}
{{Please provide the jar on your classpath to parse sqlite files.}}
{{See tika-parsers/pom.xml for the correct version.}}
{{*application/javascript*}}

and

{{*$ java -jar tika-app-1.17.jar -d mxGraphEditor.min.js*}}
{{Mar 16, 2018 6:58:11 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem}}
{{WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored}}
{{See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io}}
{{for optional dependencies.}}
{{TIFFImageWriter not loaded. tiff files will not be processed}}
{{See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io}}
{{for optional dependencies.}}
{{J2KImageReader not loaded. JPEG2000 files will not be processed.}}
{{See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io}}
{{for optional dependencies.}}{{Mar 16, 2018 6:58:11 PM 
org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem}}
{{WARNING: org.xerial's sqlite-jdbc is not loaded.}}
{{Please provide the jar on your classpath to parse sqlite files.}}
{{See tika-parsers/pom.xml for the correct version.}}
{{*text/x-matlab*}}

So it is reproducible.

> tika matlab parser incorrectly identifies content type of minified javascript 
> file
> --
>
> Key: TIKA-2608
> URL: https://issues.apache.org/jira/browse/TIKA-2608
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.17
> Environment: * xwiki 10.1,
>  * Tomcat 8 (8.0.32-1ubuntu1)
>  * Ubuntu 16.04.4 LTS
>  * Oracle Java 1.8.0_161-b12
>Reporter: pdwalker
>Priority: Minor
>
> When the tika "detects" the following file, it returns the wrong content type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:09:54 GMT}}
>  {{Content-Type: text/x-matlab}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> However, the unminified version of the same file returns the correct type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:10:25 GMT}}
>  {{Content-Type: application/javascript}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> The problem this causes is when my xwiki installation is behind an ssl proxy 
> (nginx) and I enable the add_header X-Content-Type-Options nosniff; header.  
> Modern browsers return the following error:
> {quote}Refused to execute script from 
> '[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js|https://wiki.proxy.domain/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]'
>  because its MIME type ('text/x-matlab') is not executable, and strict MIME 
> type checking is enabled.
> {quote}
> My "solution" is to disable the strict mime type checking in the ssl proxy, 
> but I don't think that is idea.  It'd be better of the matlab parser didn't 
> claim random minified js files as its own.
>  
> Note:
> Edit: I  marked the problem as being with the matlab parser, but that may be 
> incorrect - I'm not sure exactly what code actually does the detection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (TIKA-2608) tika matlab parser incorrectly identifies content type of minified javascript file

2018-03-16 Thread pdwalker (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pdwalker updated TIKA-2608:
---
Comment: was deleted

(was: I tried downloading the tiki-app.jar file from 
[http://tika.apache.org/download.html,] but the jar appears corrupted and the 
sha1/md5 sums do not match what is listed on the website.  :-\)

> tika matlab parser incorrectly identifies content type of minified javascript 
> file
> --
>
> Key: TIKA-2608
> URL: https://issues.apache.org/jira/browse/TIKA-2608
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.17
> Environment: * xwiki 10.1,
>  * Tomcat 8 (8.0.32-1ubuntu1)
>  * Ubuntu 16.04.4 LTS
>  * Oracle Java 1.8.0_161-b12
>Reporter: pdwalker
>Priority: Minor
>
> When the tika "detects" the following file, it returns the wrong content type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:09:54 GMT}}
>  {{Content-Type: text/x-matlab}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> However, the unminified version of the same file returns the correct type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:10:25 GMT}}
>  {{Content-Type: application/javascript}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> The problem this causes is when my xwiki installation is behind an ssl proxy 
> (nginx) and I enable the add_header X-Content-Type-Options nosniff; header.  
> Modern browsers return the following error:
> {quote}Refused to execute script from 
> '[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js|https://wiki.proxy.domain/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]'
>  because its MIME type ('text/x-matlab') is not executable, and strict MIME 
> type checking is enabled.
> {quote}
> My "solution" is to disable the strict mime type checking in the ssl proxy, 
> but I don't think that is idea.  It'd be better of the matlab parser didn't 
> claim random minified js files as its own.
>  
> Note:
> Edit: I  marked the problem as being with the matlab parser, but that may be 
> incorrect - I'm not sure exactly what code actually does the detection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2608) tika matlab parser incorrectly identifies content type of minified javascript file

2018-03-16 Thread pdwalker (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401734#comment-16401734
 ] 

pdwalker commented on TIKA-2608:


I tried downloading the tiki-app.jar file from 
[http://tika.apache.org/download.html,] but the jar appears corrupted and the 
sha1/md5 sums do not match what is listed on the website.  :-\

> tika matlab parser incorrectly identifies content type of minified javascript 
> file
> --
>
> Key: TIKA-2608
> URL: https://issues.apache.org/jira/browse/TIKA-2608
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.17
> Environment: * xwiki 10.1,
>  * Tomcat 8 (8.0.32-1ubuntu1)
>  * Ubuntu 16.04.4 LTS
>  * Oracle Java 1.8.0_161-b12
>Reporter: pdwalker
>Priority: Minor
>
> When the tika "detects" the following file, it returns the wrong content type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:09:54 GMT}}
>  {{Content-Type: text/x-matlab}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> However, the unminified version of the same file returns the correct type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:10:25 GMT}}
>  {{Content-Type: application/javascript}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> The problem this causes is when my xwiki installation is behind an ssl proxy 
> (nginx) and I enable the add_header X-Content-Type-Options nosniff; header.  
> Modern browsers return the following error:
> {quote}Refused to execute script from 
> '[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js|https://wiki.proxy.domain/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]'
>  because its MIME type ('text/x-matlab') is not executable, and strict MIME 
> type checking is enabled.
> {quote}
> My "solution" is to disable the strict mime type checking in the ssl proxy, 
> but I don't think that is idea.  It'd be better of the matlab parser didn't 
> claim random minified js files as its own.
>  
> Note:
> Edit: I  marked the problem as being with the matlab parser, but that may be 
> incorrect - I'm not sure exactly what code actually does the detection.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TIKA-2608) tika matlab parser incorrectly identifies content type of minified javascript file

2018-03-16 Thread pdwalker (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pdwalker updated TIKA-2608:
---
Description: 
When the tika "detects" the following file, it returns the wrong content type:

{{$ curl -I 
[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]}}
 {{HTTP/1.1 200 OK}}
 {{Server: nginx/1.10.3 (Ubuntu)}}
 {{Date: Fri, 16 Mar 2018 10:09:54 GMT}}
 {{Content-Type: text/x-matlab}}
 {{  [snip]}}
 {{X-Frame-Options: SAMEORIGIN}}

However, the unminified version of the same file returns the correct type:

{{$ curl -I 
[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js]}}
 {{HTTP/1.1 200 OK}}
 {{Server: nginx/1.10.3 (Ubuntu)}}
 {{Date: Fri, 16 Mar 2018 10:10:25 GMT}}
 {{Content-Type: application/javascript}}
 {{  [snip]}}
 {{X-Frame-Options: SAMEORIGIN}}

The problem this causes is when my xwiki installation is behind an ssl proxy 
(nginx) and I enable the add_header X-Content-Type-Options nosniff; header.  

Modern browsers return the following error:
{quote}Refused to execute script from 
'[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js|https://wiki.proxy.domain/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]'
 because its MIME type ('text/x-matlab') is not executable, and strict MIME 
type checking is enabled.
{quote}
My "solution" is to disable the strict mime type checking in the ssl proxy, but 
I don't think that is idea.  It'd be better of the matlab parser didn't claim 
random minified js files as its own.

 

Note:

Edit: I  marked the problem as being with the matlab parser, but that may be 
incorrect - I'm not sure exactly what code actually does the detection.

 

  was:
When the tika "detects" the following file, it returns the wrong content type:

{{$ curl -I 
[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]}}
 {{HTTP/1.1 200 OK}}
 {{Server: nginx/1.10.3 (Ubuntu)}}
 {{Date: Fri, 16 Mar 2018 10:09:54 GMT}}
 {{Content-Type: text/x-matlab}}
{{  [snip]}}
{{X-Frame-Options: SAMEORIGIN}}

However, the unminified version of the same file returns the correct type:

{{$ curl -I 
[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js]}}
 {{HTTP/1.1 200 OK}}
 {{Server: nginx/1.10.3 (Ubuntu)}}
 {{Date: Fri, 16 Mar 2018 10:10:25 GMT}}
 {{Content-Type: application/javascript}}
{{  [snip]}}
{{X-Frame-Options: SAMEORIGIN}}

The problem this causes is when my xwiki installation is behind an ssl proxy 
(nginx) and I enable the add_header X-Content-Type-Options nosniff; header.  

Modern browsers return the following error:
{quote}Refused to execute script from 
'[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js|https://wiki.proxy.domain/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]'
 because its MIME type ('text/x-matlab') is not executable, and strict MIME 
type checking is enabled.
{quote}
My "solution" is to disable the strict mime type checking in the ssl proxy, but 
I don't think that is idea.  It'd be better of the matlab parser didn't claim 
random minified js files as its own.


> tika matlab parser incorrectly identifies content type of minified javascript 
> file
> --
>
> Key: TIKA-2608
> URL: https://issues.apache.org/jira/browse/TIKA-2608
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.17
> Environment: * xwiki 10.1,
>  * Tomcat 8 (8.0.32-1ubuntu1)
>  * Ubuntu 16.04.4 LTS
>  * Oracle Java 1.8.0_161-b12
>Reporter: pdwalker
>Priority: Minor
>
> When the tika "detects" the following file, it returns the wrong content type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:09:54 GMT}}
>  {{Content-Type: text/x-matlab}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> However, the unminified version of the same file returns the correct type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:10:25 GMT}}
>  {{Content-Type: application/javascript}}
>  {{  [snip]}}
>  {{X-Frame-Options: SAMEORIGIN}}
> The problem this causes is when my xwiki installation is behind an ssl proxy 
> (nginx) and I enable the add_header X-Content-Type-Options nosniff; header.  
> Modern browsers return the following error:
> {quote}Refused to execute script from 
> '[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxg

[jira] [Updated] (TIKA-2608) tika matlab parser incorrectly identifies content type of minified javascript file

2018-03-16 Thread pdwalker (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pdwalker updated TIKA-2608:
---
Description: 
When the tika "detects" the following file, it returns the wrong content type:

{{$ curl -I 
[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]}}
 {{HTTP/1.1 200 OK}}
 {{Server: nginx/1.10.3 (Ubuntu)}}
 {{Date: Fri, 16 Mar 2018 10:09:54 GMT}}
 {{Content-Type: text/x-matlab}}
{{  [snip]}}
{{X-Frame-Options: SAMEORIGIN}}

However, the unminified version of the same file returns the correct type:

{{$ curl -I 
[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js]}}
 {{HTTP/1.1 200 OK}}
 {{Server: nginx/1.10.3 (Ubuntu)}}
 {{Date: Fri, 16 Mar 2018 10:10:25 GMT}}
 {{Content-Type: application/javascript}}
{{  [snip]}}
{{X-Frame-Options: SAMEORIGIN}}

The problem this causes is when my xwiki installation is behind an ssl proxy 
(nginx) and I enable the add_header X-Content-Type-Options nosniff; header.  

Modern browsers return the following error:
{quote}Refused to execute script from 
'[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js|https://wiki.proxy.domain/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]'
 because its MIME type ('text/x-matlab') is not executable, and strict MIME 
type checking is enabled.
{quote}
My "solution" is to disable the strict mime type checking in the ssl proxy, but 
I don't think that is idea.  It'd be better of the matlab parser didn't claim 
random minified js files as its own.

  was:
When the tika "detects" the following file, it returns the wrong content type:

{{$ curl -I 
https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js}}
{{HTTP/1.1 200 OK}}
{{Server: nginx/1.10.3 (Ubuntu)}}
{{Date: Fri, 16 Mar 2018 10:09:54 GMT}}
{{Content-Type: text/x-matlab}}
{{Connection: keep-alive}}
{{Access-Control-Allow-Origin: *}}
{{Set-Cookie: JSESSIONID=B1FD2399240BB7BEA6EC83095806491F; Path=/xwiki/; 
HttpOnly}}
{{Cache-Control: public}}
{{Expires: Sat, 16 Mar 2019 10:09:54 GMT}}
{{Last-Modified: Fri, 16 Mar 2018 10:09:54 GMT}}
{{Strict-Transport-Security: max-age=31536000; includeSubdomains}}
{{X-Frame-Options: SAMEORIGIN}}

However, the unminified version of the same file returns the correct type:

{{$ curl -I 
https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js}}
{{HTTP/1.1 200 OK}}
{{Server: nginx/1.10.3 (Ubuntu)}}
{{Date: Fri, 16 Mar 2018 10:10:25 GMT}}
{{Content-Type: application/javascript}}
{{Connection: keep-alive}}
{{Access-Control-Allow-Origin: *}}
{{Set-Cookie: JSESSIONID=604F281C24DFD6C8897F0BEBDD123339; Path=/xwiki/; 
HttpOnly}}
{{Cache-Control: public}}
{{Expires: Sat, 16 Mar 2019 10:10:25 GMT}}
{{Last-Modified: Fri, 16 Mar 2018 10:10:25 GMT}}
{{Vary: Accept-Encoding}}
{{Strict-Transport-Security: max-age=31536000; includeSubdomains}}
{{X-Frame-Options: SAMEORIGIN}}

The problem this causes is when my xwiki installation is behind an ssl proxy 
(nginx) and I enable the add_header X-Content-Type-Options nosniff; header.  

Modern browsers return the following error:
{quote}Refused to execute script from 
'[https://wiki.proxy.domain/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]'
 because its MIME type ('text/x-matlab') is not executable, and strict MIME 
type checking is enabled.
{quote}
My "solution" is to disable the strict mime type checking in the ssl proxy, but 
I don't think that is idea.  It'd be better of the matlab parser didn't claim 
random minified js files as its own.


> tika matlab parser incorrectly identifies content type of minified javascript 
> file
> --
>
> Key: TIKA-2608
> URL: https://issues.apache.org/jira/browse/TIKA-2608
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.17
> Environment: * xwiki 10.1,
>  * Tomcat 8 (8.0.32-1ubuntu1)
>  * Ubuntu 16.04.4 LTS
>  * Oracle Java 1.8.0_161-b12
>Reporter: pdwalker
>Priority: Minor
>
> When the tika "detects" the following file, it returns the wrong content type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:09:54 GMT}}
>  {{Content-Type: text/x-matlab}}
> {{  [snip]}}
> {{X-Frame-Options: SAMEORIGIN}}
> However, the unminified version of the same file returns the correct type:
> {{$ curl -I 
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js]}}
>  {{HTTP/1.1 200 OK}}
>  {{Server: nginx/1.10.3 (Ubuntu)}}
>  {{Date: Fri, 16 Mar 2018 10:10:25 GMT}}
>  {{Content-Type

[jira] [Created] (TIKA-2608) tika matlab parser incorrectly identifies content type of minified javascript file

2018-03-16 Thread pdwalker (JIRA)
pdwalker created TIKA-2608:
--

 Summary: tika matlab parser incorrectly identifies content type of 
minified javascript file
 Key: TIKA-2608
 URL: https://issues.apache.org/jira/browse/TIKA-2608
 Project: Tika
  Issue Type: Bug
  Components: parser
Affects Versions: 1.17
 Environment: * xwiki 10.1,
 * Tomcat 8 (8.0.32-1ubuntu1)
 * Ubuntu 16.04.4 LTS
 * Oracle Java 1.8.0_161-b12
Reporter: pdwalker


When the tika "detects" the following file, it returns the wrong content type:

{{$ curl -I 
https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js}}
{{HTTP/1.1 200 OK}}
{{Server: nginx/1.10.3 (Ubuntu)}}
{{Date: Fri, 16 Mar 2018 10:09:54 GMT}}
{{Content-Type: text/x-matlab}}
{{Connection: keep-alive}}
{{Access-Control-Allow-Origin: *}}
{{Set-Cookie: JSESSIONID=B1FD2399240BB7BEA6EC83095806491F; Path=/xwiki/; 
HttpOnly}}
{{Cache-Control: public}}
{{Expires: Sat, 16 Mar 2019 10:09:54 GMT}}
{{Last-Modified: Fri, 16 Mar 2018 10:09:54 GMT}}
{{Strict-Transport-Security: max-age=31536000; includeSubdomains}}
{{X-Frame-Options: SAMEORIGIN}}

However, the unminified version of the same file returns the correct type:

{{$ curl -I 
https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js}}
{{HTTP/1.1 200 OK}}
{{Server: nginx/1.10.3 (Ubuntu)}}
{{Date: Fri, 16 Mar 2018 10:10:25 GMT}}
{{Content-Type: application/javascript}}
{{Connection: keep-alive}}
{{Access-Control-Allow-Origin: *}}
{{Set-Cookie: JSESSIONID=604F281C24DFD6C8897F0BEBDD123339; Path=/xwiki/; 
HttpOnly}}
{{Cache-Control: public}}
{{Expires: Sat, 16 Mar 2019 10:10:25 GMT}}
{{Last-Modified: Fri, 16 Mar 2018 10:10:25 GMT}}
{{Vary: Accept-Encoding}}
{{Strict-Transport-Security: max-age=31536000; includeSubdomains}}
{{X-Frame-Options: SAMEORIGIN}}

The problem this causes is when my xwiki installation is behind an ssl proxy 
(nginx) and I enable the add_header X-Content-Type-Options nosniff; header.  

Modern browsers return the following error:
{quote}Refused to execute script from 
'[https://wiki.proxy.domain/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]'
 because its MIME type ('text/x-matlab') is not executable, and strict MIME 
type checking is enabled.
{quote}
My "solution" is to disable the strict mime type checking in the ssl proxy, but 
I don't think that is idea.  It'd be better of the matlab parser didn't claim 
random minified js files as its own.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)