[
https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377563#comment-17377563
]
Packiaraj Sakkanan commented on TIKA-3466:
------------------------------------------
Hi [~tallison]
Here is the stripped-down version of the use case.
1. user can upload a file using upload functionality of web application ( say
for batch processing )
2. Application has list of filetypes that is valid and allowed by user to
upload
a. This validation of uploaded file is done using Tika (find the media-type
of the uploaded file to validate the allowed file types ) Since XML is valid
file types to exchange data between application, we need to allow xml file
upload; thus 'application/xml' is allowed
b. since its batch processing actual processing of the file takes place as
separate thread ( processing time also varies ) we cannot delete the file upon
completion HTTP request
3. Uploaded file can be downloaded by any authorized user
4. Now malicious user can craft a file that evade the file validation in step
2.a and execute an attack ( details of this step would reveal more security
related information)
To summarize,
we would like to allow only xml file to be uploaded into the app, HTML & xHTML
should be rejected as these are not part of the allowed lists
> Cannot detect mimetype of xhtml file when script is first node instead of html
> ------------------------------------------------------------------------------
>
> Key: TIKA-3466
> URL: https://issues.apache.org/jira/browse/TIKA-3466
> Project: Tika
> Issue Type: Bug
> Components: detector, mime
> Affects Versions: 1.27
> Reporter: Packiaraj Sakkanan
> Priority: Major
>
> mime-type of below xhtml file deduced as 'application/xml' instead of
> 'application/xhtml+xml'
> {code:java}
> <?xml version="1.0" encoding="UTF-8" ?>
> <script xmlns="http://www.w3.org/1999/xhtml"><![CDATA[
> alert(555);
> ]]></script>
> {code}
>
> one possible solution is to add 'script' in tika-mimetypes.xml, like
> {code:java}
> <mime-type type="application/xhtml+xml">
> <!-- The magic priority for xhtml+xml needs to be lower than that of -->
> <!-- files that contain HTML within them, e.g. mime emails -->
> <magic priority="40">
> <match value="<html xmlns=" type="string" offset="0:8192"/>
> </magic>
> <root-XML namespaceURI="http://www.w3.org/1999/xhtml" localName="html"/>
> <root-XML namespaceURI="http://www.w3.org/1999/xhtml" localName="script"/>
> <glob pattern="*.xhtml"/>
> <glob pattern="*.xht"/>
> </mime-type>
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)