[ 
https://issues.apache.org/jira/browse/TIKA-3308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17603038#comment-17603038
 ] 

Nick Burch commented on TIKA-3308:
----------------------------------

Our HTML mime type has both root-XML tags for well-formed documents, and a 
bunch of magic for the rest. So, adding some magic as well for these documents 
is in theory possible

Checking for {{<svg xmlns="http://www.w3.org/2000/svg"}} with a decent priority 
should be fine, but I'm not sure we'd want to look for just {{<svg}}

Thoughts [~tallison] ?

> SVG file without xml declaration tag is detected as text/plain
> --------------------------------------------------------------
>
>                 Key: TIKA-3308
>                 URL: https://issues.apache.org/jira/browse/TIKA-3308
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.25
>            Reporter: Anas Hammani
>            Priority: Minor
>         Attachments: logo-luma.svg
>
>
> The SVG file attached to the issue is interpreted as *text/plain* by
> {code:java}
> tika.detect(filePath){code}
>  
> If I add 
> {code:java}
>  <?xml version="1.0" standalone="no"?> {code}
> at the beginning of the file, then tika detects it as  "image/svg+xml"
>  
> When i read the documentation i see that xml is not necessary for a file to 
> be well-formed
> [https://www.w3.org/TR/REC-xml/#sec-prolog-dtd]
>  
> It will be great if tika can detect a file as a SVG without the prolog
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to