[ 
https://issues.apache.org/jira/browse/NIFI-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412336#comment-15412336
 ] 

Joseph Witt commented on NIFI-2374:
-----------------------------------

Hello [~trixpan].  I've moved this to 1.1.0 just given when it came into 
release and what appears to remain.  Findings:
1) The only thing we're depending on right now is tika-core so it doesn't 
include all the parsers.  
2) The list you reference as parsers is great but we need to validate what we 
actually include parsers for.  We can probably get this programatically.  If 
not this list appears safer to use than the asf-git repo entry 
"https://tika.apache.org/1.13/formats.html#Full_list_of_Supported_Formats";
3) We need to review the version changes involved here because if it changes 
dependencies (and we'd definitely need to watch that) then we need to account 
for them in all the L&N.

One idea to consider is to make Tika-Parsers/Detection be split out into its 
own nar because it could be quite huge and quite powerful and would have some 
pretty specific dependency implications.  Tika is no doubt very cool and 
powerful so we should figure out the best way to get this incorporated. 

> IdentifyMimeType documentation is misleading
> --------------------------------------------
>
>                 Key: NIFI-2374
>                 URL: https://issues.apache.org/jira/browse/NIFI-2374
>             Project: Apache NiFi
>          Issue Type: Improvement
>    Affects Versions: 1.0.0, 0.7.0
>            Reporter: Andre
>            Assignee: Andre
>            Priority: Minor
>             Fix For: 1.1.0
>
>
> The current documentation of IdentifyMimeType mentions the processor is 
> capable of identifying a reasonably small range of file types.
> However, upon inspecting the code, it becomes evident that the processor 
> employs Apache Tike detectors and parsers (required to distinguish a ZIP file 
> from a JAR).
> This means the list of File(MIME) types detected is the same as the one 
> present in Tika's DefaultDetector.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to