[GitHub] nifi issue #252: NIFI-615 - Create a processor to extract WAV file character...
Github user JPercivall commented on the issue: https://github.com/apache/nifi/pull/252 @jskora were you able to rebase and open a new PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi issue #252: NIFI-615 - Create a processor to extract WAV file character...
Github user jskora commented on the issue: https://github.com/apache/nifi/pull/252 Ok, I think I've resolved the problems. The Tika conflict was fixed by udpating Tika and Metadata Extractor dependency versions and adjusting ExtractImageMetadata for new attribute names in the newer parser. I've stored my current branch on my Github branch [NIFI-615-v2](https://github.com/jskora/nifi/tree/NIFI-615-v2). I still need to rebase that, I probably won't be able to do that until Monday. Please take a look if you get a chance and I'll try to get the pull updated on Monday. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi issue #252: NIFI-615 - Create a processor to extract WAV file character...
Github user JPercivall commented on the issue: https://github.com/apache/nifi/pull/252 Thanks for the updates @jskora, unless if you say otherwise I'm going to assume you're gonna get this knocked out for 0.7.0 as soon as possible. If you think you need more time and can have it slide then feel free to remove the 0.7.0 tag. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi issue #252: NIFI-615 - Create a processor to extract WAV file character...
Github user jskora commented on the issue: https://github.com/apache/nifi/pull/252 Ironically, this boils down to a version dependency conflict between ExtractMediaMetadata which uses Tika 1.7 which uses Drew Noakes' Metadata-Extract 2.6.2 and the existing ExtractImageMetadata which used Drew Noakes' Metadata-Extract 2.7.2. I'm looking into ways to resolve this. One option may be to return ExtractImageMetadata to it's own nifi-image-bundle nar and create a new nifi-media-bundle-nar instead the rename of n-image-b-n to n-media-b-n that I originally did. Another is to look into newer Tika versions, but I don't know if that will create other problems. I'll keep you posted. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi issue #252: NIFI-615 - Create a processor to extract WAV file character...
Github user jskora commented on the issue: https://github.com/apache/nifi/pull/252 So, the problems you had with the JPG file are two separate issues. - The default `BodyContentHandler` passed to the parser can only handle 100,000 bytes, so any file larger than that produces that message and is only partially parsed. **_I fixed this by adding an optional `Content Buffer Size` property to allow that to be increased or set to unlimited and related tests._** - The Tika JPEG parser appears to have a reference error. It references the class `com.drew.lang.BufferReader` but that does not appear to be in the Tika package or the version of the Drew Noakes Metadata Extractor package that Tika references. **_I'm looking into whether updating to a newer Tika will fix this or if we have dependency conflicts causing the problem._** --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi issue #252: NIFI-615 - Create a processor to extract WAV file character...
Github user JPercivall commented on the issue: https://github.com/apache/nifi/pull/252 I set this up to analyze all the files I have in my downloads folder to see what happened and what errors I came across. I found a couple interesting ones. The first should be a configurable property. The second looks like problems attempting to extract metadata from JPGs. You can recreate the problem by setting up a unit test to analyze simple.jpg (in test resources): org.apache.tika.sax.WriteOutContentHandler$WriteLimitReachedException: Your document contained more than 10 characters, and so your requested limit has been reached. To receive the full text of the document, increase your limit. (Text up to the limit is however available). 2016-06-15 15:33:22,677 ERROR [Timer-Driven Process Thread-7] o.a.n.p.media.ExtractMediaMetadata ExtractMediaMetadata[id=c4e52258-dac5-43b1-b951-2d7f9a7ebf6c] ExtractMediaMetadata[id=c4e52258-dac5-43b1-b951-2d7f9a7ebf6c] failed to process due to java.lang.NoClassDefFoundError: com/drew/lang/BufferReader; rolling back session: java.lang.NoClassDefFoundError: com/drew/lang/BufferReader 2016-06-15 15:33:22,679 ERROR [Timer-Driven Process Thread-7] o.a.n.p.media.ExtractMediaMetadata java.lang.NoClassDefFoundError: com/drew/lang/BufferReader at org.apache.tika.parser.jpeg.JpegParser.parse(JpegParser.java:56) ~[na:na] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) ~[na:na] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) ~[na:na] at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) ~[na:na] at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:136) ~[na:na] at org.apache.nifi.processors.media.ExtractMediaMetadata.tika_parse(ExtractMediaMetadata.java:239) ~[na:na] at org.apache.nifi.processors.media.ExtractMediaMetadata.access$000(ExtractMediaMetadata.java:71) ~[na:na] at org.apache.nifi.processors.media.ExtractMediaMetadata$1.process(ExtractMediaMetadata.java:215) ~[na:na] at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1806) ~[nifi-framework-core-0.6.0-SNAPSHOT.jar:0.6.0-SNAPSHOT] at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1777) ~[nifi-framework-core-0.6.0-SNAPSHOT.jar:0.6.0-SNAPSHOT] at org.apache.nifi.processors.media.ExtractMediaMetadata.onTrigger(ExtractMediaMetadata.java:211) ~[na:na] at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) ~[nifi-api-0.6.0-SNAPSHOT.jar:0.6.0-SNAPSHOT] at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1139) [nifi-framework-core-0.6.0-SNAPSHOT.jar:0.6.0-SNAPSHOT] at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:139) [nifi-framework-core-0.6.0-SNAPSHOT.jar:0.6.0-SNAPSHOT] at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:49) [nifi-framework-core-0.6.0-SNAPSHOT.jar:0.6.0-SNAPSHOT] at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:124) [nifi-framework-core-0.6.0-SNAPSHOT.jar:0.6.0-SNAPSHOT] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_74] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_74] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_74] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_74] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_74] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_74] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_74] Caused by: java.lang.ClassNotFoundException: com.drew.lang.BufferReader at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[na:1.8.0_74] at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[na:1.8.0_74] at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[na:1.8.0_74] ... 23 common frames omitted --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi issue #252: NIFI-615 - Create a processor to extract WAV file character...
Github user jskora commented on the issue: https://github.com/apache/nifi/pull/252 @JPercivall, I'll try to get this updated tomorrow morning. This was originally done in February, but I'm hoping the rebase won't be too bad. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi issue #252: NIFI-615 - Create a processor to extract WAV file character...
Github user JPercivall commented on the issue: https://github.com/apache/nifi/pull/252 Gonna review this for potential inclusion in 0.7.0. There are merge conflicts, @jskora can you rebase it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---