[ 
https://issues.apache.org/jira/browse/RAT-150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17838513#comment-17838513
 ] 

Claude Warren commented on RAT-150:
-----------------------------------

I am working on a patch to use tika to discover the file type and map that to 
our file types.  It seems that Tika will give us the textual content of the 
file if we ask for it.  However, by default it seems to strip comments from XML 
so I will have to figure out how to ask it for comments too.

> RAT should use Apache Tika to simply guess ignored [application/X] file types 
> and focus on the [text/Y] family as a sensible default
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: RAT-150
>                 URL: https://issues.apache.org/jira/browse/RAT-150
>             Project: Apache Rat
>          Issue Type: New Feature
>          Components: mime-meta-data, scan
>    Affects Versions: 0.8
>            Reporter: Chris A. Mattmann
>            Assignee: Claude Warren
>            Priority: Major
>
> RAT could use Apache Tika to automatically guess file types, obviating the 
> need to specify an explicit white list or black list.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to