[ 
https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377563#comment-17377563
 ] 

Packiaraj Sakkanan commented on TIKA-3466:
------------------------------------------

Hi [~tallison] 
 Here is the stripped-down version of the use case.

 1. user can upload a file using upload functionality of web application ( say 
for batch processing )
 2. Application has list of filetypes that is valid and allowed by user to 
upload 
    a. This validation of uploaded file is done using Tika (find the media-type 
of the uploaded file to validate the allowed file types ) Since XML is valid 
file types to exchange data between application, we need to allow xml file 
upload; thus 'application/xml' is allowed
    b. since its batch processing actual processing of the file takes place as 
separate thread ( processing time also varies ) we cannot delete the file upon 
completion HTTP request 
 3. Uploaded file can be downloaded by any authorized user
 4. Now malicious user can craft a file that evade the file validation in step 
2.a and execute an attack ( details of this step would reveal more security 
related information) 


 To summarize,
 we would like to allow only xml file to be uploaded into the app, HTML & xHTML 
should be rejected as these are not part of the allowed lists 
  

> Cannot detect mimetype of xhtml file when script is first node instead of html
> ------------------------------------------------------------------------------
>
>                 Key: TIKA-3466
>                 URL: https://issues.apache.org/jira/browse/TIKA-3466
>             Project: Tika
>          Issue Type: Bug
>          Components: detector, mime
>    Affects Versions: 1.27
>            Reporter: Packiaraj Sakkanan
>            Priority: Major
>
> mime-type of below xhtml file deduced as 'application/xml' instead of 
> 'application/xhtml+xml' 
> {code:java}
> <?xml version="1.0" encoding="UTF-8" ?>
> <script xmlns="http://www.w3.org/1999/xhtml";><![CDATA[
>   alert(555);
>   ]]></script>
> {code}
>  
>  one possible solution is to add 'script' in tika-mimetypes.xml, like 
> {code:java}
> <mime-type type="application/xhtml+xml">
>   <!-- The magic priority for xhtml+xml needs to be lower than that of -->
>   <!--  files that contain HTML within them, e.g. mime emails -->
>   <magic priority="40">
>     <match value="&lt;html xmlns=" type="string" offset="0:8192"/>
>   </magic>
>   <root-XML namespaceURI="http://www.w3.org/1999/xhtml"; localName="html"/>
>   <root-XML namespaceURI="http://www.w3.org/1999/xhtml"; localName="script"/>
>   <glob pattern="*.xhtml"/>
>   <glob pattern="*.xht"/>
> </mime-type>
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to