[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274612#comment-14274612 ] Luis Filipe Nassif commented on TIKA-1511: -- Yes, there are native libs for windows, mac and linux packed into xerial sqlite-jdbc-3.8.7.jar, but there are other wrappers if that is a problem. The license for xerial-jdbc is Apache v2. > Create a parser for SQLite3 > --- > > Key: TIKA-1511 > URL: https://issues.apache.org/jira/browse/TIKA-1511 > Project: Tika > Issue Type: New Feature > Components: parser >Affects Versions: 1.6 >Reporter: Luis Filipe Nassif > Fix For: 1.8 > > > I think it would be very useful, as sqlite is used as data storage by a wide > range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1222) Tika does not extract attachments from RFC822 files
[ https://issues.apache.org/jira/browse/TIKA-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luis Filipe Nassif updated TIKA-1222: - Description: TikaApp --extract option does not extract attachments from RFC822 files. The issue happens because MailContentHandler.body(...) method gets a Parser.class object from the context and calls parser.parse(). It should get a EmbeddedDocumentExtractor.class object from the ParseContext one and call embeddedDocumentExtractor.parseEmbedded(), similar to other Container parsers. (was: TikaCli --extract option does not extract attachments from RFC822 files. The issue happens because MailContentHandler.body(...) method gets a Parser.class object from the context and calls parser.parse(). It should get a EmbeddedDocumentExtractor.class object from the ParseContext one and call embeddedDocumentExtractor.parseEmbedded(), similar to other Container parsers.) Affects Version/s: 1.5 1.6 > Tika does not extract attachments from RFC822 files > --- > > Key: TIKA-1222 > URL: https://issues.apache.org/jira/browse/TIKA-1222 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.4, 1.5, 1.6 >Reporter: Luis Filipe Nassif > Attachments: Tika-1222.patch > > > TikaApp --extract option does not extract attachments from RFC822 files. The > issue happens because MailContentHandler.body(...) method gets a Parser.class > object from the context and calls parser.parse(). It should get a > EmbeddedDocumentExtractor.class object from the ParseContext one and call > embeddedDocumentExtractor.parseEmbedded(), similar to other Container parsers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: ExternalParser isn't called
Hello. I haven't such issue on current trunk (r1651112) with both avi and mp4 test files when invoking via tika-app. -- Best regards, Konstantin Gribov Mon Jan 12 2015 at 23:05:57, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov>: > So..in trunk right now if you try and run tika against the only > ExternalParser defined (for ffmpeg), the ExternalParser isn’t > called. It’s not for lack of MIME type too (they look right), so > I’m investigating. > > ++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++ > > > > >
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274159#comment-14274159 ] Nick Burch commented on TIKA-1511: -- Just to be sure, since SQLite doesn't show up in the [Apache Legal FAQ list|http://www.apache.org/legal/resolved.html], it'd probably be worth raising a legal jira (link from [the legal page|http://www.apache.org/legal/resolved.html) just to get confirmation that it's fine to use + clarify what (if any) notice entry is needed for it > Create a parser for SQLite3 > --- > > Key: TIKA-1511 > URL: https://issues.apache.org/jira/browse/TIKA-1511 > Project: Tika > Issue Type: New Feature > Components: parser >Affects Versions: 1.6 >Reporter: Luis Filipe Nassif > Fix For: 1.8 > > > I think it would be very useful, as sqlite is used as data storage by a wide > range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1511: -- Priority: Major (was: Minor) > Create a parser for SQLite3 > --- > > Key: TIKA-1511 > URL: https://issues.apache.org/jira/browse/TIKA-1511 > Project: Tika > Issue Type: New Feature > Components: parser >Affects Versions: 1.6 >Reporter: Luis Filipe Nassif > Fix For: 1.8 > > > I think it would be very useful, as sqlite is used as data storage by a wide > range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1511: -- Fix Version/s: 1.8 > Create a parser for SQLite3 > --- > > Key: TIKA-1511 > URL: https://issues.apache.org/jira/browse/TIKA-1511 > Project: Tika > Issue Type: New Feature > Components: parser >Affects Versions: 1.6 >Reporter: Luis Filipe Nassif >Priority: Minor > Fix For: 1.8 > > > I think it would be very useful, as sqlite is used as data storage by a wide > range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274135#comment-14274135 ] Tim Allison commented on TIKA-1511: --- Agreed on the license. I'm able to create and write to a sqlite db with just the jar from maven: {noformat} org.xerial sqlite-jdbc 3.8.7 {noformat} I don't think I have native libs kicking around my system somewhere, or do I? This will add another 4 MB to tika-app/tika-server, but I think that it is worth it... > Create a parser for SQLite3 > --- > > Key: TIKA-1511 > URL: https://issues.apache.org/jira/browse/TIKA-1511 > Project: Tika > Issue Type: New Feature > Components: parser >Affects Versions: 1.6 >Reporter: Luis Filipe Nassif >Priority: Minor > > I think it would be very useful, as sqlite is used as data storage by a wide > range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
ExternalParser isn't called
So..in trunk right now if you try and run tika against the only ExternalParser defined (for ffmpeg), the ExternalParser isn’t called. It’s not for lack of MIME type too (they look right), so I’m investigating. ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273905#comment-14273905 ] Luis Filipe Nassif commented on TIKA-1511: -- I don't see any problems too. I think "public domain" is more liberal than apache v2, because the authors abdicated their copyright. But sqlite needs native libs. Could it be a poblem? > Create a parser for SQLite3 > --- > > Key: TIKA-1511 > URL: https://issues.apache.org/jira/browse/TIKA-1511 > Project: Tika > Issue Type: New Feature > Components: parser >Affects Versions: 1.6 >Reporter: Luis Filipe Nassif >Priority: Minor > > I think it would be very useful, as sqlite is used as data storage by a wide > range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273869#comment-14273869 ] Tim Allison commented on TIKA-1511: --- See any licensing problems with bundling sqlite dependency? It isn't Apache v2, but what we'd bundle isn't licensed at all ([link|https://www.sqlite.org/copyright.html]). I don't see a problem, but wanted to check to see if anyone has any issues. Thank you for opening this issue! > Create a parser for SQLite3 > --- > > Key: TIKA-1511 > URL: https://issues.apache.org/jira/browse/TIKA-1511 > Project: Tika > Issue Type: New Feature > Components: parser >Affects Versions: 1.6 >Reporter: Luis Filipe Nassif >Priority: Minor > > I think it would be very useful, as sqlite is used as data storage by a wide > range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: TestMultiPart tests failing
Hi Chris, I'm not getting any test failures from trunk on my Mac. So, I'm also curious what revision you're on. uname -a: Darwin Tylers-MacBook-Pro.local 14.0.0 Darwin Kernel Version 14.0.0: Fri Sep 19 00:26:44 PDT 2014; root:xnu-2782.1.97~2/RELEASE_X86_64 x86_64 tesseract --version: tesseract 3.02.02 leptonica-1.71 libjpeg 8d : libpng 1.6.13 : libtiff 4.0.3 : zlib 1.2.5 java -version: java version "1.7.0_71" Java(TM) SE Runtime Environment (build 1.7.0_71-b14) Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode) Tyler On Mon, Jan 12, 2015 at 8:34 AM, Allison, Timothy B. wrote: > Chris, > > Is this on an updated and/or reverted trunk or on an modified rc-3? > > I haven't gotten around to installing tesseract yet so I can't actually > kick the tires, but the last time there was a test for 5 items on line 91 > of RFC822ParserTest was in r1552405...before the fixes for TIKA-1422. > > But r1552405 doesn't quite seem to fit the error message, which says that > it can't find 5 "div" (if I understand correctly), and in r1552405 the test > was for 5 "p". > > In r161 and 1633325, there is a path through the code to test for 5 > "div" if Tesseract is running, but that isn't occurring on line 91 in those > revisions. > > -Original Message- > From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] > Sent: Sunday, January 11, 2015 7:33 PM > To: dev@tika.apache.org > Subject: TestMultiPart tests failing > > Hey Guys, > > I’m on Mac OS X 10.9.4, Java version: > > [chipotle:~/src/tika] mattmann% uname -a > Darwin chipotle.local 13.3.0 Darwin Kernel Version 13.3.0: Tue Jun 3 > 21:27:35 PDT 2014; root:xnu-2422.110.17~1/RELEASE_X86_64 x86_64 > [chipotle:~/src/tika] mattmann% java -version > java version "1.7.0_60" > Java(TM) SE Runtime Environment (build 1.7.0_60-b19) > Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode) > [chipotle:~/src/tika] mattmann% > > > > With Tesseract installed: > > [chipotle:~/src/tika] mattmann% tesseract --version > tesseract 3.02.02 > leptonica-1.71 > libjpeg 8d : libpng 1.6.13 : libtiff 4.0.3 : zlib 1.2.5 > > [chipotle:~/src/tika] mattmann% > > > > And the following tests are failing: > > Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.007 sec > Running org.apache.tika.parser.xml.EmptyAndDuplicateElementsXMLParserTest > Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.006 sec > Running org.apache.tika.parser.xml.FictionBookParserTest > Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.004 sec > Running org.apache.tika.sax.PhoneExtractingContentHandlerTest > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.009 sec > Running org.apache.tika.TestParsers > Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.669 sec > > Results : > > Failed tests: > testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) > > Tests run: 572, Failures: 1, Errors: 0, Skipped: 2 > > [INFO] - > > > > Test set: org.apache.tika.parser.mail.RFC822ParserTest > --- > > Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.903 sec > <<< FAILURE! > testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: > 0.289 sec <<< FAILURE! > org.mockito.exceptions.verification.TooLittleActualInvocations: > xHTMLContentHandler.startElement( > "http://www.w3.org/1999/xhtml";, > "div", > "div", > isA(org.xml.sax.Attributes) > ); > Wanted 5 times but was 4 > at > org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest > .java:91) > Caused by: org.mockito.exceptions.cause.TooLittleInvocations: > Too little invocations: > at > org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDeco > rator.java:126) > at > org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java > :264) > at > org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.ja > va:254) > at > org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.ja > va:291) > at > org.apache.tika.parser.mail.MailContentHandler.startBodyPart(MailContentHan > dler.java:242) > at org.apache.james.mime4j.parser.MimeStr > > > > Ideas? > > Cheers, > Chris > > ++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > +
RE: TestMultiPart tests failing
Chris, Is this on an updated and/or reverted trunk or on an modified rc-3? I haven't gotten around to installing tesseract yet so I can't actually kick the tires, but the last time there was a test for 5 items on line 91 of RFC822ParserTest was in r1552405...before the fixes for TIKA-1422. But r1552405 doesn't quite seem to fit the error message, which says that it can't find 5 "div" (if I understand correctly), and in r1552405 the test was for 5 "p". In r161 and 1633325, there is a path through the code to test for 5 "div" if Tesseract is running, but that isn't occurring on line 91 in those revisions. -Original Message- From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Sunday, January 11, 2015 7:33 PM To: dev@tika.apache.org Subject: TestMultiPart tests failing Hey Guys, I’m on Mac OS X 10.9.4, Java version: [chipotle:~/src/tika] mattmann% uname -a Darwin chipotle.local 13.3.0 Darwin Kernel Version 13.3.0: Tue Jun 3 21:27:35 PDT 2014; root:xnu-2422.110.17~1/RELEASE_X86_64 x86_64 [chipotle:~/src/tika] mattmann% java -version java version "1.7.0_60" Java(TM) SE Runtime Environment (build 1.7.0_60-b19) Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode) [chipotle:~/src/tika] mattmann% With Tesseract installed: [chipotle:~/src/tika] mattmann% tesseract --version tesseract 3.02.02 leptonica-1.71 libjpeg 8d : libpng 1.6.13 : libtiff 4.0.3 : zlib 1.2.5 [chipotle:~/src/tika] mattmann% And the following tests are failing: Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.007 sec Running org.apache.tika.parser.xml.EmptyAndDuplicateElementsXMLParserTest Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.006 sec Running org.apache.tika.parser.xml.FictionBookParserTest Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.004 sec Running org.apache.tika.sax.PhoneExtractingContentHandlerTest Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.009 sec Running org.apache.tika.TestParsers Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.669 sec Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 572, Failures: 1, Errors: 0, Skipped: 2 [INFO] - Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.903 sec <<< FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.289 sec <<< FAILURE! org.mockito.exceptions.verification.TooLittleActualInvocations: xHTMLContentHandler.startElement( "http://www.w3.org/1999/xhtml";, "div", "div", isA(org.xml.sax.Attributes) ); Wanted 5 times but was 4 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest .java:91) Caused by: org.mockito.exceptions.cause.TooLittleInvocations: Too little invocations: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDeco rator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java :264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.ja va:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.ja va:291) at org.apache.tika.parser.mail.MailContentHandler.startBodyPart(MailContentHan dler.java:242) at org.apache.james.mime4j.parser.MimeStr Ideas? Cheers, Chris ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
[jira] [Commented] (TIKA-1512) WordParser fails on many Word files
[ https://issues.apache.org/jira/browse/TIKA-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273681#comment-14273681 ] Nick Burch commented on TIKA-1512: -- What about subsequent runs - I'm wondering where the closing quote is for the hyperlink? Also, does the text contain the whole of the URL, or is it truncated at all? If you open the file in Word and do a save-as, does the file then parse properly, or does the problem remain? If you open the file in Word, does the hyperlink work properly in Word? > WordParser fails on many Word files > --- > > Key: TIKA-1512 > URL: https://issues.apache.org/jira/browse/TIKA-1512 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.5, 1.6, 1.7, 1.8 > Environment: Linux 64bit > OpenJDK Runtime Environment (IcedTea 2.4.4) (suse-24.13.5-x86_64) > OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode) > and > java version "1.6.0" > Java(TM) SE Runtime Environment > IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux amd64-64 (JIT enabled, AOT > enabled) >Reporter: F Seid >Assignee: Jukka Zitting > > WordParser fail on some word files. A negative value is sent to substring -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1512) WordParser fails on many Word files
[ https://issues.apache.org/jira/browse/TIKA-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273632#comment-14273632 ] F Seid commented on TIKA-1512: -- text var is like this on error (shown via jdb): text = " HYPERLINK "http://foo?bar=t&faa=http://fbb/abc.html&fbb=tata " > WordParser fails on many Word files > --- > > Key: TIKA-1512 > URL: https://issues.apache.org/jira/browse/TIKA-1512 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.5, 1.6, 1.7, 1.8 > Environment: Linux 64bit > OpenJDK Runtime Environment (IcedTea 2.4.4) (suse-24.13.5-x86_64) > OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode) > and > java version "1.6.0" > Java(TM) SE Runtime Environment > IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux amd64-64 (JIT enabled, AOT > enabled) >Reporter: F Seid >Assignee: Jukka Zitting > > WordParser fail on some word files. A negative value is sent to substring -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1512) WordParser fails on many Word files
[ https://issues.apache.org/jira/browse/TIKA-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273586#comment-14273586 ] Nick Burch commented on TIKA-1512: -- I worry that might be solving the symptom not the problem Any chance you could step into a problem file with a debugger, and let us know what the text of each character run in a problematic paragraph are? > WordParser fails on many Word files > --- > > Key: TIKA-1512 > URL: https://issues.apache.org/jira/browse/TIKA-1512 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.5, 1.6, 1.7, 1.8 > Environment: Linux 64bit > OpenJDK Runtime Environment (IcedTea 2.4.4) (suse-24.13.5-x86_64) > OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode) > and > java version "1.6.0" > Java(TM) SE Runtime Environment > IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux amd64-64 (JIT enabled, AOT > enabled) >Reporter: F Seid >Assignee: Jukka Zitting > > WordParser fail on some word files. A negative value is sent to substring -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1510) FFMpeg installed but not parsing video files
[ https://issues.apache.org/jira/browse/TIKA-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273520#comment-14273520 ] Ray Gauss II commented on TIKA-1510: Yes. The only reason I haven't myself is that I've been trying to find some time to refactor the vorbis stuff per the previous [conversation|http://mail-archives.apache.org/mod_mbox/tika-dev/201408.mbox/%3calpine.deb.2.02.1408221155450.8...@urchin.earth.li%3E] with [~gagravarr]. > FFMpeg installed but not parsing video files > > > Key: TIKA-1510 > URL: https://issues.apache.org/jira/browse/TIKA-1510 > Project: Tika > Issue Type: Bug > Components: parser > Environment: FFMPEG, Mac OS X 10.9 with HomeBrew >Reporter: Chris A. Mattmann >Assignee: Chris A. Mattmann > Fix For: 1.7 > > > I have FFMPEG installed with homebrew: > {noformat} > # brew install ffmpeg > {noformat} > I've got some AVI files and have tried to parse them with Tika: > {noformat} > [chipotle:~/Desktop/drone-vids] mattmann% tika -m SPOT11_01\ 17.AVI > Content-Length: 334917340 > Content-Type: video/x-msvideo > X-Parsed-By: org.apache.tika.parser.EmptyParser > resourceName: SPOT11_01 17.AVI > {noformat} > I took a look at the ExternalParser, which is configured for using ffmpeg if > it's installed. It seems it only works on: > {code:xml} > >video/avi >video/mpeg > > {code} > I'll add video/x-msvideo and see if that fixes it. I also stumbled upon the > work by [~rgauss] at Github - Ray I noticed there is no parser in that work: > https://github.com/AlfrescoLabs/tika-ffmpeg > But there seems to be metadata extraction code, etc. Ray should I do > something with this? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1512) WordParser fails on many Word files
[ https://issues.apache.org/jira/browse/TIKA-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273515#comment-14273515 ] F Seid commented on TIKA-1512: -- Sorry, but i cannot share those files. But it helped to put another check in the if clause above WordExtractor.java:407 to see if lastIndexOf is behind indexOf('"'). > WordParser fails on many Word files > --- > > Key: TIKA-1512 > URL: https://issues.apache.org/jira/browse/TIKA-1512 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.5, 1.6, 1.7, 1.8 > Environment: Linux 64bit > OpenJDK Runtime Environment (IcedTea 2.4.4) (suse-24.13.5-x86_64) > OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode) > and > java version "1.6.0" > Java(TM) SE Runtime Environment > IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux amd64-64 (JIT enabled, AOT > enabled) >Reporter: F Seid >Assignee: Jukka Zitting > > WordParser fail on some word files. A negative value is sent to substring -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1512) WordParser fails on many Word files
[ https://issues.apache.org/jira/browse/TIKA-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273486#comment-14273486 ] Nick Burch commented on TIKA-1512: -- Do you have a very small sample file that triggers the problem? > WordParser fails on many Word files > --- > > Key: TIKA-1512 > URL: https://issues.apache.org/jira/browse/TIKA-1512 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.5, 1.6, 1.7, 1.8 > Environment: Linux 64bit > OpenJDK Runtime Environment (IcedTea 2.4.4) (suse-24.13.5-x86_64) > OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode) > and > java version "1.6.0" > Java(TM) SE Runtime Environment > IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux amd64-64 (JIT enabled, AOT > enabled) >Reporter: F Seid >Assignee: Jukka Zitting > > WordParser fail on some word files. A negative value is sent to substring -- This message was sent by Atlassian JIRA (v6.3.4#6332)