[ 
https://issues.apache.org/jira/browse/TIKA-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186218#comment-15186218
 ] 

Namitha Sanjeeva Ganiga commented on TIKA-1881:
-----------------------------------------------

For the Atom, RSS and RDF :
This was from the FHT analysis in these files. We found some of these files 
classified into Octet-Stream, and all these 3 types had the occurrence of the 
pattern may times in the first 50 bytes or so. I based this purely on the 
analysis and cannot hence find any information about this on the web. As you 
mention, if your advice is to remove these patterns to be on the safer side, I 
will modify the pull request removing these.

For the Postscript one : I will redo this in the pull request itself.

> On updating mime magic for existing mime types
> ----------------------------------------------
>
>                 Key: TIKA-1881
>                 URL: https://issues.apache.org/jira/browse/TIKA-1881
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 1.11
>            Reporter: Namitha Sanjeeva Ganiga
>            Priority: Minor
>              Labels: mime
>             Fix For: 1.11
>
>
> Updated Mime-Magic for 6 mime types:
> 1. application/postscript : files begin with pattern "%!PS-Adobe-3.0 
> EPSF-3.0".
> 2. application/wordperfect: files begin with pattern "ÿWPC" .
> 3. image/tiff : updated pattern for "MM.+" for Big endian format.(occur at 
> the beginning of files of tiff mime type)
> 4. application/rdf+xml : updated pattern "rdf" ( from byte offset 5 to 400) 
> 5. application/atom+xml : updated pattern "feed" ( from byte offset 5 to 50)
> 6. application/rss+xml : updated pattern "rss" ( from byte offset 5 to 50)
> https://github.com/NamithaGS/tika/commit/780100767e24505a24595ea6db43978d0700e220



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to