[ https://issues.apache.org/jira/browse/NIFI-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15481534#comment-15481534 ]
Andre commented on NIFI-2374: ----------------------------- [~joewitt] Note sure if we are on the same page, but this is truly a version bump, no added functionality, specially around metadata extraction via parsers. 1 - I am not sure if we need the parsers to be honest... If I understand Tika correctly, the core library does identification while the Parsers would allow us to extract metadata from the identified files. I base this understanding on the following excerpt from the URL you linked: bq. Please note that Apache Tika is able to detect a much wider range of formats than those listed below, this page only documents those formats from which Tika is able to extract metadata and/or textual content. 2 - The list is for parsers, not for "file magic" performed by [Detector|https://tika.apache.org/1.13/api/org/apache/tika/detect/Detector.html] we call here: https://github.com/apache/nifi/blob/f987b216090f29719976ed1693be2ea358523aa5/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/IdentifyMimeType.java#L134 I tried to find a better list but couldn't. :-( 3 - Very valid point... Afaik no changes in regards to NIFI-2667 :-) So just to emphasise again, my idea was just to bump dependency version, without adding any additional Tika feature. Let me know if you would like some extra action I will be happy to address. > IdentifyMimeType documentation is misleading > -------------------------------------------- > > Key: NIFI-2374 > URL: https://issues.apache.org/jira/browse/NIFI-2374 > Project: Apache NiFi > Issue Type: Improvement > Affects Versions: 1.0.0, 0.7.0 > Reporter: Andre > Assignee: Andre > Priority: Minor > Fix For: 1.1.0 > > > The current documentation of IdentifyMimeType mentions the processor is > capable of identifying a reasonably small range of file types. > However, upon inspecting the code, it becomes evident that the processor > employs Apache Tike detectors and parsers (required to distinguish a ZIP file > from a JAR). > This means the list of File(MIME) types detected is the same as the one > present in Tika's DefaultDetector. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)