[ 
https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376798#comment-17376798
 ] 

Packiaraj Sakkanan edited comment on TIKA-3466 at 7/7/21, 7:06 PM:
-------------------------------------------------------------------

Wouldn't be sufficent to check namepsace alone 
[{color:#ff8b00}xmlns="http://www.w3.org/1999/xhtml"{color:#172b4d}] to decided 
if its XHTML or XML document?{color}
 {color}


was (Author: psakkanan):
Wouldn't be sufficent to check namepsace alone 
[{color:#ff8b00}xmlns="http://www.w3.org/1999/xhtml"{color:#172b4d}] to decided 
if its XHTML or XML document?

{color}{color}

> Cannot detect mimetype of xhtml file when script is first node instead of html
> ------------------------------------------------------------------------------
>
>                 Key: TIKA-3466
>                 URL: https://issues.apache.org/jira/browse/TIKA-3466
>             Project: Tika
>          Issue Type: Bug
>          Components: detector, mime
>    Affects Versions: 1.27
>            Reporter: Packiaraj Sakkanan
>            Priority: Major
>
> mime-type of below xhtml file deduced as 'application/xml' instead of 
> 'application/xhtml+xml' 
> {code:java}
> <?xml version="1.0" encoding="UTF-8" ?>
> <script xmlns="http://www.w3.org/1999/xhtml";><![CDATA[
>   alert(555);
>   ]]></script>
> {code}
>  
>  one possible solution is to add 'script' in tika-mimetypes.xml, like 
> {code:java}
> <mime-type type="application/xhtml+xml">
>   <!-- The magic priority for xhtml+xml needs to be lower than that of -->
>   <!--  files that contain HTML within them, e.g. mime emails -->
>   <magic priority="40">
>     <match value="&lt;html xmlns=" type="string" offset="0:8192"/>
>   </magic>
>   <root-XML namespaceURI="http://www.w3.org/1999/xhtml"; localName="html"/>
>   <root-XML namespaceURI="http://www.w3.org/1999/xhtml"; localName="script"/>
>   <glob pattern="*.xhtml"/>
>   <glob pattern="*.xht"/>
> </mime-type>
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to