[GitHub] nifi issue #252: NIFI-615 - Create a processor to extract WAV file character...

2016-06-20 Thread JPercivall
Github user JPercivall commented on the issue:

https://github.com/apache/nifi/pull/252
  
@jskora were you able to rebase and open a new PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi issue #252: NIFI-615 - Create a processor to extract WAV file character...

2016-06-16 Thread jskora
Github user jskora commented on the issue:

https://github.com/apache/nifi/pull/252
  
Ok, I think I've resolved the problems.  The Tika conflict was fixed by 
udpating Tika and Metadata Extractor dependency versions and adjusting 
ExtractImageMetadata for new attribute names in the newer parser.

I've stored my current branch on my Github branch 
[NIFI-615-v2](https://github.com/jskora/nifi/tree/NIFI-615-v2).  I still need 
to rebase that, I probably won't be able to do that until Monday.  Please take 
a look if you get a chance and I'll try to get the pull updated on Monday.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi issue #252: NIFI-615 - Create a processor to extract WAV file character...

2016-06-16 Thread JPercivall
Github user JPercivall commented on the issue:

https://github.com/apache/nifi/pull/252
  
Thanks for the updates @jskora, unless if you say otherwise I'm going to 
assume you're gonna get this knocked out for 0.7.0 as soon as possible. If you 
think you need more time and can have it slide then feel free to remove the 
0.7.0 tag.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi issue #252: NIFI-615 - Create a processor to extract WAV file character...

2016-06-16 Thread jskora
Github user jskora commented on the issue:

https://github.com/apache/nifi/pull/252
  
Ironically, this boils down to a version dependency conflict between 
ExtractMediaMetadata which uses Tika 1.7 which uses Drew Noakes' 
Metadata-Extract 2.6.2 and the existing ExtractImageMetadata which used Drew 
Noakes' Metadata-Extract 2.7.2.

I'm looking into ways to resolve this.  One option may be to return 
ExtractImageMetadata to it's own nifi-image-bundle nar and create a new 
nifi-media-bundle-nar instead the rename of n-image-b-n to n-media-b-n that I 
originally did.  Another is to look into newer Tika versions, but I don't know 
if that will create other problems.  I'll keep you posted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi issue #252: NIFI-615 - Create a processor to extract WAV file character...

2016-06-16 Thread jskora
Github user jskora commented on the issue:

https://github.com/apache/nifi/pull/252
  
So, the problems you had with the JPG file are two separate issues.

- The default `BodyContentHandler` passed to the parser can only handle 
100,000 bytes, so any file larger than that produces that message and is only 
partially parsed.  **_I fixed this by adding an optional `Content Buffer Size` 
property to allow that to be increased or set to unlimited and related tests._**
- The Tika JPEG parser appears to have a reference error.  It references 
the class `com.drew.lang.BufferReader` but that does not appear to be in the 
Tika package or the version of the Drew Noakes Metadata Extractor package that 
Tika references.  **_I'm looking into whether updating to a newer Tika will fix 
this or if we have dependency conflicts causing the problem._**


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi issue #252: NIFI-615 - Create a processor to extract WAV file character...

2016-06-15 Thread JPercivall
Github user JPercivall commented on the issue:

https://github.com/apache/nifi/pull/252
  
I set this up to analyze all the files I have in my downloads folder to see 
what happened and what errors I came across. I found a couple interesting ones.

The first should be a configurable property. The second looks like problems 
attempting to extract metadata from JPGs. You can recreate the problem by 
setting up a unit test to analyze simple.jpg (in test resources):

org.apache.tika.sax.WriteOutContentHandler$WriteLimitReachedException: Your 
document contained more than 10 characters, and so your requested limit has 
been reached. To receive the full text of the document, increase your limit. 
(Text up to the limit is however available).

2016-06-15 15:33:22,677 ERROR [Timer-Driven Process Thread-7] 
o.a.n.p.media.ExtractMediaMetadata 
ExtractMediaMetadata[id=c4e52258-dac5-43b1-b951-2d7f9a7ebf6c] 
ExtractMediaMetadata[id=c4e52258-dac5-43b1-b951-2d7f9a7ebf6c] failed to process 
due to java.lang.NoClassDefFoundError: com/drew/lang/BufferReader; rolling back 
session: java.lang.NoClassDefFoundError: com/drew/lang/BufferReader
2016-06-15 15:33:22,679 ERROR [Timer-Driven Process Thread-7] 
o.a.n.p.media.ExtractMediaMetadata
java.lang.NoClassDefFoundError: com/drew/lang/BufferReader
at org.apache.tika.parser.jpeg.JpegParser.parse(JpegParser.java:56) 
~[na:na]
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) ~[na:na]
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) ~[na:na]
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) 
~[na:na]
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:136) 
~[na:na]
at 
org.apache.nifi.processors.media.ExtractMediaMetadata.tika_parse(ExtractMediaMetadata.java:239)
 ~[na:na]
at 
org.apache.nifi.processors.media.ExtractMediaMetadata.access$000(ExtractMediaMetadata.java:71)
 ~[na:na]
at 
org.apache.nifi.processors.media.ExtractMediaMetadata$1.process(ExtractMediaMetadata.java:215)
 ~[na:na]
at 
org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1806)
 ~[nifi-framework-core-0.6.0-SNAPSHOT.jar:0.6.0-SNAPSHOT]
at 
org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1777)
 ~[nifi-framework-core-0.6.0-SNAPSHOT.jar:0.6.0-SNAPSHOT]
at 
org.apache.nifi.processors.media.ExtractMediaMetadata.onTrigger(ExtractMediaMetadata.java:211)
 ~[na:na]
at 
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
 ~[nifi-api-0.6.0-SNAPSHOT.jar:0.6.0-SNAPSHOT]
at 
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1139)
 [nifi-framework-core-0.6.0-SNAPSHOT.jar:0.6.0-SNAPSHOT]
at 
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:139)
 [nifi-framework-core-0.6.0-SNAPSHOT.jar:0.6.0-SNAPSHOT]
at 
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:49)
 [nifi-framework-core-0.6.0-SNAPSHOT.jar:0.6.0-SNAPSHOT]
at 
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:124)
 [nifi-framework-core-0.6.0-SNAPSHOT.jar:0.6.0-SNAPSHOT]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_74]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
[na:1.8.0_74]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 [na:1.8.0_74]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 [na:1.8.0_74]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_74]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_74]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_74]
Caused by: java.lang.ClassNotFoundException: com.drew.lang.BufferReader
at java.net.URLClassLoader.findClass(URLClassLoader.java:381) 
~[na:1.8.0_74]
at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 
~[na:1.8.0_74]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 
~[na:1.8.0_74]
... 23 common frames omitted


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi issue #252: NIFI-615 - Create a processor to extract WAV file character...

2016-06-15 Thread jskora
Github user jskora commented on the issue:

https://github.com/apache/nifi/pull/252
  
@JPercivall, I'll try to get this updated tomorrow morning.  This was 
originally done in February, but I'm hoping the rebase won't be too bad.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi issue #252: NIFI-615 - Create a processor to extract WAV file character...

2016-06-15 Thread JPercivall
Github user JPercivall commented on the issue:

https://github.com/apache/nifi/pull/252
  
Gonna review this for potential inclusion in 0.7.0.

There are merge conflicts, @jskora can you rebase it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---