[ 
https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376851#comment-17376851
 ] 

Kenneth William Krugler commented on TIKA-3466:
-----------------------------------------------

Hi [~psakkanan] - that namespace is inside of the {{<script>}} tag, so it 
wouldn't help for other cases. And FWIR {{xmlns}} isn't valid in a {{<script>}} 
tag, so checking for an invalid attribute (given the tag) in an invalid tag 
(given its position in the HTML) doesn't seem like a win.

> Cannot detect mimetype of xhtml file when script is first node instead of html
> ------------------------------------------------------------------------------
>
>                 Key: TIKA-3466
>                 URL: https://issues.apache.org/jira/browse/TIKA-3466
>             Project: Tika
>          Issue Type: Bug
>          Components: detector, mime
>    Affects Versions: 1.27
>            Reporter: Packiaraj Sakkanan
>            Priority: Major
>
> mime-type of below xhtml file deduced as 'application/xml' instead of 
> 'application/xhtml+xml' 
> {code:java}
> <?xml version="1.0" encoding="UTF-8" ?>
> <script xmlns="http://www.w3.org/1999/xhtml";><![CDATA[
>   alert(555);
>   ]]></script>
> {code}
>  
>  one possible solution is to add 'script' in tika-mimetypes.xml, like 
> {code:java}
> <mime-type type="application/xhtml+xml">
>   <!-- The magic priority for xhtml+xml needs to be lower than that of -->
>   <!--  files that contain HTML within them, e.g. mime emails -->
>   <magic priority="40">
>     <match value="&lt;html xmlns=" type="string" offset="0:8192"/>
>   </magic>
>   <root-XML namespaceURI="http://www.w3.org/1999/xhtml"; localName="html"/>
>   <root-XML namespaceURI="http://www.w3.org/1999/xhtml"; localName="script"/>
>   <glob pattern="*.xhtml"/>
>   <glob pattern="*.xht"/>
> </mime-type>
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to