[jira] [Commented] (NIFI-2374) IdentifyMimeType documentation is misleading

2016-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390764#comment-15390764
 ] 

ASF GitHub Bot commented on NIFI-2374:
--

GitHub user trixpan opened a pull request:

https://github.com/apache/nifi/pull/712

NIFI-2374 and NIFI-2375 - Minor improve to documentation and version bump

NIFI-2374 - Today when I was about to raise an ISSUE I've noticed that 
although the IdentifyMimeType documentation provides a list of MIME-types that 
list is far from complete. This commit slightly changes wording to reflect this

NIFI-2375 - Bump Apache Tika's version used by IdentifyMimeType and 
ExtractMediaMetada processors

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/trixpan/nifi NIFI-2374

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/712.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #712


commit d172899d315062dc1432af64c1f2ee4b0ceb07ad
Author: Andre F de Miranda 
Date:   2016-07-23T15:08:42Z

NIFI-2374 - Adjust documentation wording to clarify IdentifyMimeType
 is a non exhaustive list of mime type values

commit b918bd4d8d140543a46f2988b43eb2bd95999c8a
Author: Andre F de Miranda 
Date:   2016-07-23T15:10:39Z

NIFI-2375 - Bump Apache Tika dependency version to 1.13




> IdentifyMimeType documentation is misleading
> 
>
> Key: NIFI-2374
> URL: https://issues.apache.org/jira/browse/NIFI-2374
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.0.0, 0.7.0
>Reporter: Andre
>Assignee: Andre
>Priority: Minor
>
> The current documentation of IdentifyMimeType mentions the processor is 
> capable of identifying a reasonably small range of file types.
> However, upon inspecting the code, it becomes evident that the processor 
> employs Apache Tike detectors and parsers (required to distinguish a ZIP file 
> from a JAR).
> This means the list of File(MIME) types detected is the same as the one 
> present in Tika's DefaultDetector.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2374) IdentifyMimeType documentation is misleading

2016-08-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403370#comment-15403370
 ] 

ASF GitHub Bot commented on NIFI-2374:
--

Github user trixpan commented on the issue:

https://github.com/apache/nifi/pull/712
  
@olegz mind having a look on this? Simple PR


> IdentifyMimeType documentation is misleading
> 
>
> Key: NIFI-2374
> URL: https://issues.apache.org/jira/browse/NIFI-2374
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.0.0, 0.7.0
>Reporter: Andre
>Assignee: Andre
>Priority: Minor
>
> The current documentation of IdentifyMimeType mentions the processor is 
> capable of identifying a reasonably small range of file types.
> However, upon inspecting the code, it becomes evident that the processor 
> employs Apache Tike detectors and parsers (required to distinguish a ZIP file 
> from a JAR).
> This means the list of File(MIME) types detected is the same as the one 
> present in Tika's DefaultDetector.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2374) IdentifyMimeType documentation is misleading

2016-08-08 Thread Joseph Witt (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411934#comment-15411934
 ] 

Joseph Witt commented on NIFI-2374:
---

looking

> IdentifyMimeType documentation is misleading
> 
>
> Key: NIFI-2374
> URL: https://issues.apache.org/jira/browse/NIFI-2374
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.0.0, 0.7.0
>Reporter: Andre
>Assignee: Andre
>Priority: Minor
> Fix For: 1.0.0
>
>
> The current documentation of IdentifyMimeType mentions the processor is 
> capable of identifying a reasonably small range of file types.
> However, upon inspecting the code, it becomes evident that the processor 
> employs Apache Tike detectors and parsers (required to distinguish a ZIP file 
> from a JAR).
> This means the list of File(MIME) types detected is the same as the one 
> present in Tika's DefaultDetector.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2374) IdentifyMimeType documentation is misleading

2016-08-08 Thread Joseph Witt (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412336#comment-15412336
 ] 

Joseph Witt commented on NIFI-2374:
---

Hello [~trixpan].  I've moved this to 1.1.0 just given when it came into 
release and what appears to remain.  Findings:
1) The only thing we're depending on right now is tika-core so it doesn't 
include all the parsers.  
2) The list you reference as parsers is great but we need to validate what we 
actually include parsers for.  We can probably get this programatically.  If 
not this list appears safer to use than the asf-git repo entry 
"https://tika.apache.org/1.13/formats.html#Full_list_of_Supported_Formats";
3) We need to review the version changes involved here because if it changes 
dependencies (and we'd definitely need to watch that) then we need to account 
for them in all the L&N.

One idea to consider is to make Tika-Parsers/Detection be split out into its 
own nar because it could be quite huge and quite powerful and would have some 
pretty specific dependency implications.  Tika is no doubt very cool and 
powerful so we should figure out the best way to get this incorporated. 

> IdentifyMimeType documentation is misleading
> 
>
> Key: NIFI-2374
> URL: https://issues.apache.org/jira/browse/NIFI-2374
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.0.0, 0.7.0
>Reporter: Andre
>Assignee: Andre
>Priority: Minor
> Fix For: 1.1.0
>
>
> The current documentation of IdentifyMimeType mentions the processor is 
> capable of identifying a reasonably small range of file types.
> However, upon inspecting the code, it becomes evident that the processor 
> employs Apache Tike detectors and parsers (required to distinguish a ZIP file 
> from a JAR).
> This means the list of File(MIME) types detected is the same as the one 
> present in Tika's DefaultDetector.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2374) IdentifyMimeType documentation is misleading

2016-09-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15478880#comment-15478880
 ] 

ASF GitHub Bot commented on NIFI-2374:
--

Github user trixpan commented on the issue:

https://github.com/apache/nifi/pull/712
  
@joewitt 

now that 1.0.0 has been released, may I ask you what do you think we need 
to test in order to bump tika dependency?



> IdentifyMimeType documentation is misleading
> 
>
> Key: NIFI-2374
> URL: https://issues.apache.org/jira/browse/NIFI-2374
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.0.0, 0.7.0
>Reporter: Andre
>Assignee: Andre
>Priority: Minor
> Fix For: 1.1.0
>
>
> The current documentation of IdentifyMimeType mentions the processor is 
> capable of identifying a reasonably small range of file types.
> However, upon inspecting the code, it becomes evident that the processor 
> employs Apache Tike detectors and parsers (required to distinguish a ZIP file 
> from a JAR).
> This means the list of File(MIME) types detected is the same as the one 
> present in Tika's DefaultDetector.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2374) IdentifyMimeType documentation is misleading

2016-09-11 Thread Andre (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15481534#comment-15481534
 ] 

Andre commented on NIFI-2374:
-

[~joewitt]

Note sure if we are on the same page, but this is truly a version bump, no 
added functionality, specially around metadata extraction via parsers.

1 - I am not sure if we need the parsers to be honest... If I understand Tika 
correctly, the core library does identification while the Parsers would allow 
us to extract metadata from the identified files.

I base this understanding on the following excerpt from the URL you linked:

bq. Please note that Apache Tika is able to detect a much wider range of 
formats than those listed below, this page only documents those formats from 
which Tika is able to extract metadata and/or textual content.

2 - The list is for parsers, not for "file magic" performed by 
[Detector|https://tika.apache.org/1.13/api/org/apache/tika/detect/Detector.html]
  we call here: 

https://github.com/apache/nifi/blob/f987b216090f29719976ed1693be2ea358523aa5/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/IdentifyMimeType.java#L134

I tried to find a better list but couldn't. :-(

3 - Very valid point... Afaik no changes in regards to NIFI-2667 :-)




So just to emphasise again, my idea was just to bump dependency version, 
without adding any additional Tika feature. Let me know if you would like some 
extra action I will be happy to address.






> IdentifyMimeType documentation is misleading
> 
>
> Key: NIFI-2374
> URL: https://issues.apache.org/jira/browse/NIFI-2374
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.0.0, 0.7.0
>Reporter: Andre
>Assignee: Andre
>Priority: Minor
> Fix For: 1.1.0
>
>
> The current documentation of IdentifyMimeType mentions the processor is 
> capable of identifying a reasonably small range of file types.
> However, upon inspecting the code, it becomes evident that the processor 
> employs Apache Tike detectors and parsers (required to distinguish a ZIP file 
> from a JAR).
> This means the list of File(MIME) types detected is the same as the one 
> present in Tika's DefaultDetector.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2374) IdentifyMimeType documentation is misleading

2016-11-15 Thread Joseph Witt (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15669138#comment-15669138
 ] 

Joseph Witt commented on NIFI-2374:
---

ok - don't love that we need to provide that link to the tika source code in 
their git repo but there doesn't appear to be a better alternative and I agree 
with you that it is a useful listing for folks.  Have tweaked to the spefiic 
tag for 1.14.  Running some final tests locally and assuming all is good there 
am +1 will merge shortly and will close those and NIFI-2375 together.

sorry it took so long to look into this.  I was worried about dep tree 
licensing madness but there is no issue there at all.  tika-core and 
tika-parsers have no transitive deps or at least we're not pulling them.

thanks!

> IdentifyMimeType documentation is misleading
> 
>
> Key: NIFI-2374
> URL: https://issues.apache.org/jira/browse/NIFI-2374
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.0.0, 0.7.0
>Reporter: Andre
>Assignee: Joseph Witt
>Priority: Minor
> Fix For: 1.1.0
>
>
> The current documentation of IdentifyMimeType mentions the processor is 
> capable of identifying a reasonably small range of file types.
> However, upon inspecting the code, it becomes evident that the processor 
> employs Apache Tike detectors and parsers (required to distinguish a ZIP file 
> from a JAR).
> This means the list of File(MIME) types detected is the same as the one 
> present in Tika's DefaultDetector.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2374) IdentifyMimeType documentation is misleading

2016-11-15 Thread Joseph Witt (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15669157#comment-15669157
 ] 

Joseph Witt commented on NIFI-2374:
---

oh wowza - tika-parsers def doesyeesh

> IdentifyMimeType documentation is misleading
> 
>
> Key: NIFI-2374
> URL: https://issues.apache.org/jira/browse/NIFI-2374
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.0.0, 0.7.0
>Reporter: Andre
>Assignee: Joseph Witt
>Priority: Minor
> Fix For: 1.1.0
>
>
> The current documentation of IdentifyMimeType mentions the processor is 
> capable of identifying a reasonably small range of file types.
> However, upon inspecting the code, it becomes evident that the processor 
> employs Apache Tike detectors and parsers (required to distinguish a ZIP file 
> from a JAR).
> This means the list of File(MIME) types detected is the same as the one 
> present in Tika's DefaultDetector.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2374) IdentifyMimeType documentation is misleading regarding tika detectors

2016-11-15 Thread Joseph Witt (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15669294#comment-15669294
 ] 

Joseph Witt commented on NIFI-2374:
---

ok updating tika deps where possible, avoiding where it causes licensing 
problems, updated doc as shown in andre's PR.  Left the media nar at the older 
version and commented on it why.  That can be dealt with someday in NIFI-2375 
but that needs to happen once tika-parsers uses non problematic deps.

> IdentifyMimeType documentation is misleading regarding tika detectors
> -
>
> Key: NIFI-2374
> URL: https://issues.apache.org/jira/browse/NIFI-2374
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.0.0, 0.7.0
>Reporter: Andre
>Assignee: Joseph Witt
>Priority: Minor
> Fix For: 1.1.0
>
>
> The current documentation of IdentifyMimeType mentions the processor is 
> capable of identifying a reasonably small range of file types.
> However, upon inspecting the code, it becomes evident that the processor 
> employs Apache Tike detectors and parsers (required to distinguish a ZIP file 
> from a JAR).
> This means the list of File(MIME) types detected is the same as the one 
> present in Tika's DefaultDetector.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2374) IdentifyMimeType documentation is misleading regarding tika detectors

2016-11-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15669342#comment-15669342
 ] 

ASF subversion and git services commented on NIFI-2374:
---

Commit 45a5f5295c611625cac700412630c16de2270b27 in nifi's branch 
refs/heads/master from [~joewitt]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=45a5f52 ]

NIFI-2374 This closes #712. updated to latest tika versions where possible, 
updated doc, commented why cannot update media nar


> IdentifyMimeType documentation is misleading regarding tika detectors
> -
>
> Key: NIFI-2374
> URL: https://issues.apache.org/jira/browse/NIFI-2374
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.0.0, 0.7.0
>Reporter: Andre
>Assignee: Joseph Witt
>Priority: Minor
> Fix For: 1.1.0
>
>
> The current documentation of IdentifyMimeType mentions the processor is 
> capable of identifying a reasonably small range of file types.
> However, upon inspecting the code, it becomes evident that the processor 
> employs Apache Tike detectors and parsers (required to distinguish a ZIP file 
> from a JAR).
> This means the list of File(MIME) types detected is the same as the one 
> present in Tika's DefaultDetector.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2374) IdentifyMimeType documentation is misleading regarding tika detectors

2016-11-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15669341#comment-15669341
 ] 

ASF subversion and git services commented on NIFI-2374:
---

Commit 3e571f9f1fe5ae4d58ed829a0d1e823a7a97d368 in nifi's branch 
refs/heads/master from Andre F de Miranda
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=3e571f9 ]

NIFI-2374 - Adjust documentation wording to clarify IdentifyMimeType
 is a non exhaustive list of mime type values


> IdentifyMimeType documentation is misleading regarding tika detectors
> -
>
> Key: NIFI-2374
> URL: https://issues.apache.org/jira/browse/NIFI-2374
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.0.0, 0.7.0
>Reporter: Andre
>Assignee: Joseph Witt
>Priority: Minor
> Fix For: 1.1.0
>
>
> The current documentation of IdentifyMimeType mentions the processor is 
> capable of identifying a reasonably small range of file types.
> However, upon inspecting the code, it becomes evident that the processor 
> employs Apache Tike detectors and parsers (required to distinguish a ZIP file 
> from a JAR).
> This means the list of File(MIME) types detected is the same as the one 
> present in Tika's DefaultDetector.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2374) IdentifyMimeType documentation is misleading regarding tika detectors

2016-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15669345#comment-15669345
 ] 

ASF GitHub Bot commented on NIFI-2374:
--

Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/712


> IdentifyMimeType documentation is misleading regarding tika detectors
> -
>
> Key: NIFI-2374
> URL: https://issues.apache.org/jira/browse/NIFI-2374
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.0.0, 0.7.0
>Reporter: Andre
>Assignee: Joseph Witt
>Priority: Minor
> Fix For: 1.1.0
>
>
> The current documentation of IdentifyMimeType mentions the processor is 
> capable of identifying a reasonably small range of file types.
> However, upon inspecting the code, it becomes evident that the processor 
> employs Apache Tike detectors and parsers (required to distinguish a ZIP file 
> from a JAR).
> This means the list of File(MIME) types detected is the same as the one 
> present in Tika's DefaultDetector.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)