[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858006#comment-15858006 ] Sandeepan commented on TIKA-1422: - [~thaichat04] I am also getting different result when using Tesseract through tike across Mac/Ubuntu. From command line, it give same result on both the platforms. Were you able to find the reason? > org.apache.tika.parser.mail.RFC822ParserTest fails > -- > > Key: TIKA-1422 > URL: https://issues.apache.org/jira/browse/TIKA-1422 > Project: Tika > Issue Type: Bug > Components: parser >Reporter: Chris A. Mattmann >Assignee: Chris A. Mattmann > Labels: memex > Fix For: 1.7 > > Attachments: TIKA-1422.Mattmann.100114.patch.txt, > TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.oleg.20141021.patch, > TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch > > > I'm seeing test failures from: > {noformat} > Results : > Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): > (..) > Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 > {noformat} > CentOS6 VM image, running: > {noformat} > [mattmann@memex tika]$ java -version > java version "1.7.0_67" > Java(TM) SE Runtime Environment (build 1.7.0_67-b01) > Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) > [mattmann@memex tika]$ mvn -version > Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; > 2014-02-14T09:37:52-08:00) > Maven home: /usr/share/apache-maven > Java version: 1.7.0_65, vendor: Oracle Corporation > Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre > Default locale: en_US, platform encoding: UTF-8 > OS name: "linux", version: "2.6.32-431.23.3.el6.centos.plus.x86_64", arch: > "amd64", family: "unix" > [mattmann@memex tika]$ > {noformat} > Here are the surefire reports - no clue what's up here: > {noformat} > [mattmann@memex tika]$ more > tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt > > --- > Test set: org.apache.tika.parser.mail.RFC822ParserTest > --- > Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec <<< > FAILURE! > testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: > 0.152 sec <<< FAILURE! > org.mockito.exceptions.verification.TooManyActualInvocations: > xHTMLContentHandler.startElement( > "http://www.w3.org/1999/xhtml;, > "div", > "div", > isA(org.xml.sax.Attributes) > ); > Wanted 4 times but was 5 > at > org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) > Caused by: org.mockito.exceptions.cause.UndesiredInvocation: > Undesired invocation: > at > org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) > at > org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) > at > org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) > at > org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) > at > org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) > at > org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) > at > org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) > at > org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) > at > org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) > at > org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) > at > org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) > at > org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284) > at > org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243) > at > org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) > at > org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102) > at > org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133) > at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76) > at > org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at >
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183103#comment-14183103 ] Hudson commented on TIKA-1422: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #282 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/282/]) TIKA-1422. Skip checking the number of some handler invocations in the RFC822ParserTest if Tesseract is installed. (tpalsulich: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1634094) * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/mail/RFC822ParserTest.java org.apache.tika.parser.mail.RFC822ParserTest fails -- Key: TIKA-1422 URL: https://issues.apache.org/jira/browse/TIKA-1422 Project: Tika Issue Type: Bug Components: parser Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Fix For: 1.7 Attachments: TIKA-1422.Mattmann.100114.patch.txt, TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.oleg.20141021.patch, TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch I'm seeing test failures from: {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 {noformat} CentOS6 VM image, running: {noformat} [mattmann@memex tika]$ java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) [mattmann@memex tika]$ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T09:37:52-08:00) Maven home: /usr/share/apache-maven Java version: 1.7.0_65, vendor: Oracle Corporation Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: amd64, family: unix [mattmann@memex tika]$ {noformat} Here are the surefire reports - no clue what's up here: {noformat} [mattmann@memex tika]$ more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.152 sec FAILURE! org.mockito.exceptions.verification.TooManyActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 4 times but was 5 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) Caused by: org.mockito.exceptions.cause.UndesiredInvocation: Undesired invocation: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284) at org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) at org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102) at org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133) at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76) at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183161#comment-14183161 ] Hudson commented on TIKA-1422: -- SUCCESS: Integrated in tika-trunk-jdk1.6 #262 (See [https://builds.apache.org/job/tika-trunk-jdk1.6/262/]) TIKA-1422. Skip checking the number of some handler invocations in the RFC822ParserTest if Tesseract is installed. (tpalsulich: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1634094) * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/mail/RFC822ParserTest.java org.apache.tika.parser.mail.RFC822ParserTest fails -- Key: TIKA-1422 URL: https://issues.apache.org/jira/browse/TIKA-1422 Project: Tika Issue Type: Bug Components: parser Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Fix For: 1.7 Attachments: TIKA-1422.Mattmann.100114.patch.txt, TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.oleg.20141021.patch, TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch I'm seeing test failures from: {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 {noformat} CentOS6 VM image, running: {noformat} [mattmann@memex tika]$ java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) [mattmann@memex tika]$ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T09:37:52-08:00) Maven home: /usr/share/apache-maven Java version: 1.7.0_65, vendor: Oracle Corporation Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: amd64, family: unix [mattmann@memex tika]$ {noformat} Here are the surefire reports - no clue what's up here: {noformat} [mattmann@memex tika]$ more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.152 sec FAILURE! org.mockito.exceptions.verification.TooManyActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 4 times but was 5 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) Caused by: org.mockito.exceptions.cause.UndesiredInvocation: Undesired invocation: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284) at org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) at org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102) at org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133) at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76) at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178186#comment-14178186 ] Hong-Thai Nguyen commented on TIKA-1422: Applied latest fix on r1633325 with some formatting. Thank org.apache.tika.parser.mail.RFC822ParserTest fails -- Key: TIKA-1422 URL: https://issues.apache.org/jira/browse/TIKA-1422 Project: Tika Issue Type: Bug Components: parser Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Fix For: 1.7 Attachments: TIKA-1422.Mattmann.100114.patch.txt, TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.oleg.20141021.patch, TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch I'm seeing test failures from: {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 {noformat} CentOS6 VM image, running: {noformat} [mattmann@memex tika]$ java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) [mattmann@memex tika]$ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T09:37:52-08:00) Maven home: /usr/share/apache-maven Java version: 1.7.0_65, vendor: Oracle Corporation Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: amd64, family: unix [mattmann@memex tika]$ {noformat} Here are the surefire reports - no clue what's up here: {noformat} [mattmann@memex tika]$ more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.152 sec FAILURE! org.mockito.exceptions.verification.TooManyActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 4 times but was 5 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) Caused by: org.mockito.exceptions.cause.UndesiredInvocation: Undesired invocation: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284) at org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) at org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102) at org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133) at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76) at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178197#comment-14178197 ] Hudson commented on TIKA-1422: -- FAILURE: Integrated in tika-trunk-jdk1.7 #273 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/273/]) TIKA-1422 - Apply fix of [~olegt] in Windows (thaichat04: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1633325) * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/mail/RFC822ParserTest.java org.apache.tika.parser.mail.RFC822ParserTest fails -- Key: TIKA-1422 URL: https://issues.apache.org/jira/browse/TIKA-1422 Project: Tika Issue Type: Bug Components: parser Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Fix For: 1.7 Attachments: TIKA-1422.Mattmann.100114.patch.txt, TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.oleg.20141021.patch, TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch I'm seeing test failures from: {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 {noformat} CentOS6 VM image, running: {noformat} [mattmann@memex tika]$ java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) [mattmann@memex tika]$ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T09:37:52-08:00) Maven home: /usr/share/apache-maven Java version: 1.7.0_65, vendor: Oracle Corporation Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: amd64, family: unix [mattmann@memex tika]$ {noformat} Here are the surefire reports - no clue what's up here: {noformat} [mattmann@memex tika]$ more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.152 sec FAILURE! org.mockito.exceptions.verification.TooManyActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 4 times but was 5 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) Caused by: org.mockito.exceptions.cause.UndesiredInvocation: Undesired invocation: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284) at org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) at org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102) at org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133) at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76) at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84) at
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178226#comment-14178226 ] Hudson commented on TIKA-1422: -- SUCCESS: Integrated in tika-trunk-jdk1.6 #253 (See [https://builds.apache.org/job/tika-trunk-jdk1.6/253/]) TIKA-1422 - Fixing build minor refactory of naming test class (thaichat04: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=161) * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/mail/RFC822ParserTest.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRParserTest.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRTest.java TIKA-1422 - Apply fix of [~olegt] in Windows (thaichat04: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1633325) * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/mail/RFC822ParserTest.java org.apache.tika.parser.mail.RFC822ParserTest fails -- Key: TIKA-1422 URL: https://issues.apache.org/jira/browse/TIKA-1422 Project: Tika Issue Type: Bug Components: parser Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Fix For: 1.7 Attachments: TIKA-1422.Mattmann.100114.patch.txt, TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.oleg.20141021.patch, TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch I'm seeing test failures from: {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 {noformat} CentOS6 VM image, running: {noformat} [mattmann@memex tika]$ java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) [mattmann@memex tika]$ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T09:37:52-08:00) Maven home: /usr/share/apache-maven Java version: 1.7.0_65, vendor: Oracle Corporation Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: amd64, family: unix [mattmann@memex tika]$ {noformat} Here are the surefire reports - no clue what's up here: {noformat} [mattmann@memex tika]$ more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.152 sec FAILURE! org.mockito.exceptions.verification.TooManyActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 4 times but was 5 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) Caused by: org.mockito.exceptions.cause.UndesiredInvocation: Undesired invocation: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284) at org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155) at
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178257#comment-14178257 ] Hudson commented on TIKA-1422: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #274 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/274/]) TIKA-1422 - Fixing build minor refactory of naming test class (thaichat04: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=161) * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/mail/RFC822ParserTest.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRParserTest.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRTest.java org.apache.tika.parser.mail.RFC822ParserTest fails -- Key: TIKA-1422 URL: https://issues.apache.org/jira/browse/TIKA-1422 Project: Tika Issue Type: Bug Components: parser Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Fix For: 1.7 Attachments: TIKA-1422.Mattmann.100114.patch.txt, TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.oleg.20141021.patch, TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch I'm seeing test failures from: {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 {noformat} CentOS6 VM image, running: {noformat} [mattmann@memex tika]$ java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) [mattmann@memex tika]$ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T09:37:52-08:00) Maven home: /usr/share/apache-maven Java version: 1.7.0_65, vendor: Oracle Corporation Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: amd64, family: unix [mattmann@memex tika]$ {noformat} Here are the surefire reports - no clue what's up here: {noformat} [mattmann@memex tika]$ more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.152 sec FAILURE! org.mockito.exceptions.verification.TooManyActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 4 times but was 5 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) Caused by: org.mockito.exceptions.cause.UndesiredInvocation: Undesired invocation: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284) at org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) at org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102) at org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133) at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76) at
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178866#comment-14178866 ] Tyler Palsulich commented on TIKA-1422: --- {code} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 546, Failures: 1, Errors: 0, Skipped: 4 {code} {code} Wanted 5 times but was 4 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:93) Caused by: org.mockito.exceptions.cause.TooLittleInvocations: {code} Still getting a failing test with Tesseract 3.02.02 installed on Mac. Will look into this more tomorrow. But, thank you, [~o...@apache.org] and [~thaichat04]! org.apache.tika.parser.mail.RFC822ParserTest fails -- Key: TIKA-1422 URL: https://issues.apache.org/jira/browse/TIKA-1422 Project: Tika Issue Type: Bug Components: parser Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Fix For: 1.7 Attachments: TIKA-1422.Mattmann.100114.patch.txt, TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.oleg.20141021.patch, TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch I'm seeing test failures from: {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 {noformat} CentOS6 VM image, running: {noformat} [mattmann@memex tika]$ java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) [mattmann@memex tika]$ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T09:37:52-08:00) Maven home: /usr/share/apache-maven Java version: 1.7.0_65, vendor: Oracle Corporation Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: amd64, family: unix [mattmann@memex tika]$ {noformat} Here are the surefire reports - no clue what's up here: {noformat} [mattmann@memex tika]$ more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.152 sec FAILURE! org.mockito.exceptions.verification.TooManyActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 4 times but was 5 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) Caused by: org.mockito.exceptions.cause.UndesiredInvocation: Undesired invocation: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284) at org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) at org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102) at org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133) at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76) at
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173537#comment-14173537 ] Hong-Thai Nguyen commented on TIKA-1422: I'm not using Tesseract org.apache.tika.parser.mail.RFC822ParserTest fails -- Key: TIKA-1422 URL: https://issues.apache.org/jira/browse/TIKA-1422 Project: Tika Issue Type: Bug Components: parser Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Fix For: 1.7 Attachments: TIKA-1422.Mattmann.100114.patch.txt, TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch I'm seeing test failures from: {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 {noformat} CentOS6 VM image, running: {noformat} [mattmann@memex tika]$ java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) [mattmann@memex tika]$ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T09:37:52-08:00) Maven home: /usr/share/apache-maven Java version: 1.7.0_65, vendor: Oracle Corporation Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: amd64, family: unix [mattmann@memex tika]$ {noformat} Here are the surefire reports - no clue what's up here: {noformat} [mattmann@memex tika]$ more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.152 sec FAILURE! org.mockito.exceptions.verification.TooManyActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 4 times but was 5 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) Caused by: org.mockito.exceptions.cause.UndesiredInvocation: Undesired invocation: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284) at org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) at org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102) at org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133) at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76) at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173839#comment-14173839 ] Tyler Palsulich commented on TIKA-1422: --- Can you check what {{%ErrorLevel%}} is when you try to run Tesseract from command line? org.apache.tika.parser.mail.RFC822ParserTest fails -- Key: TIKA-1422 URL: https://issues.apache.org/jira/browse/TIKA-1422 Project: Tika Issue Type: Bug Components: parser Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Fix For: 1.7 Attachments: TIKA-1422.Mattmann.100114.patch.txt, TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch I'm seeing test failures from: {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 {noformat} CentOS6 VM image, running: {noformat} [mattmann@memex tika]$ java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) [mattmann@memex tika]$ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T09:37:52-08:00) Maven home: /usr/share/apache-maven Java version: 1.7.0_65, vendor: Oracle Corporation Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: amd64, family: unix [mattmann@memex tika]$ {noformat} Here are the surefire reports - no clue what's up here: {noformat} [mattmann@memex tika]$ more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.152 sec FAILURE! org.mockito.exceptions.verification.TooManyActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 4 times but was 5 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) Caused by: org.mockito.exceptions.cause.UndesiredInvocation: Undesired invocation: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284) at org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) at org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102) at org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133) at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76) at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14172796#comment-14172796 ] Tyler Palsulich commented on TIKA-1422: --- What version of Tesseract do you have installed, [~thaichat04]? I'm also getting different results between Mac/Windows/Ubuntu. org.apache.tika.parser.mail.RFC822ParserTest fails -- Key: TIKA-1422 URL: https://issues.apache.org/jira/browse/TIKA-1422 Project: Tika Issue Type: Bug Components: parser Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Fix For: 1.7 Attachments: TIKA-1422.Mattmann.100114.patch.txt, TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch I'm seeing test failures from: {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 {noformat} CentOS6 VM image, running: {noformat} [mattmann@memex tika]$ java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) [mattmann@memex tika]$ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T09:37:52-08:00) Maven home: /usr/share/apache-maven Java version: 1.7.0_65, vendor: Oracle Corporation Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: amd64, family: unix [mattmann@memex tika]$ {noformat} Here are the surefire reports - no clue what's up here: {noformat} [mattmann@memex tika]$ more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.152 sec FAILURE! org.mockito.exceptions.verification.TooManyActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 4 times but was 5 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) Caused by: org.mockito.exceptions.cause.UndesiredInvocation: Undesired invocation: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284) at org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) at org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102) at org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133) at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76) at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168704#comment-14168704 ] Chris A. Mattmann commented on TIKA-1422: - OK I went ahead and created TIKA-1445 on the ImageParser part. Tyler and my latest patch seem to fix the test issue. I'm going to commit that to deal with TIKA-1422, then deal with the Image parsing met (and take that out of this patch) and deal with it in TIKA-1445. org.apache.tika.parser.mail.RFC822ParserTest fails -- Key: TIKA-1422 URL: https://issues.apache.org/jira/browse/TIKA-1422 Project: Tika Issue Type: Bug Components: parser Reporter: Chris A. Mattmann Fix For: 1.7 Attachments: TIKA-1422.Mattmann.100114.patch.txt, TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch I'm seeing test failures from: {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 {noformat} CentOS6 VM image, running: {noformat} [mattmann@memex tika]$ java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) [mattmann@memex tika]$ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T09:37:52-08:00) Maven home: /usr/share/apache-maven Java version: 1.7.0_65, vendor: Oracle Corporation Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: amd64, family: unix [mattmann@memex tika]$ {noformat} Here are the surefire reports - no clue what's up here: {noformat} [mattmann@memex tika]$ more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.152 sec FAILURE! org.mockito.exceptions.verification.TooManyActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 4 times but was 5 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) Caused by: org.mockito.exceptions.cause.UndesiredInvocation: Undesired invocation: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284) at org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) at org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102) at org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133) at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76) at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168708#comment-14168708 ] Chris A. Mattmann commented on TIKA-1422: - okey dok, I went ahead and committed this in r1631206 since it fixes the test with/without Tesseract it seems. I'll address the image parsing issues in TIKA-1445. Thanks Tyler and everyone! org.apache.tika.parser.mail.RFC822ParserTest fails -- Key: TIKA-1422 URL: https://issues.apache.org/jira/browse/TIKA-1422 Project: Tika Issue Type: Bug Components: parser Reporter: Chris A. Mattmann Fix For: 1.7 Attachments: TIKA-1422.Mattmann.100114.patch.txt, TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch I'm seeing test failures from: {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 {noformat} CentOS6 VM image, running: {noformat} [mattmann@memex tika]$ java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) [mattmann@memex tika]$ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T09:37:52-08:00) Maven home: /usr/share/apache-maven Java version: 1.7.0_65, vendor: Oracle Corporation Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: amd64, family: unix [mattmann@memex tika]$ {noformat} Here are the surefire reports - no clue what's up here: {noformat} [mattmann@memex tika]$ more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.152 sec FAILURE! org.mockito.exceptions.verification.TooManyActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 4 times but was 5 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) Caused by: org.mockito.exceptions.cause.UndesiredInvocation: Undesired invocation: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284) at org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) at org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102) at org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133) at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76) at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606)
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168713#comment-14168713 ] Hudson commented on TIKA-1422: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #259 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/259/]) Fix for TIKA-1422 contributed by tpalsulich and mattmann. (mattmann: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1631206) * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRTest.java org.apache.tika.parser.mail.RFC822ParserTest fails -- Key: TIKA-1422 URL: https://issues.apache.org/jira/browse/TIKA-1422 Project: Tika Issue Type: Bug Components: parser Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Fix For: 1.7 Attachments: TIKA-1422.Mattmann.100114.patch.txt, TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch I'm seeing test failures from: {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 {noformat} CentOS6 VM image, running: {noformat} [mattmann@memex tika]$ java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) [mattmann@memex tika]$ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T09:37:52-08:00) Maven home: /usr/share/apache-maven Java version: 1.7.0_65, vendor: Oracle Corporation Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: amd64, family: unix [mattmann@memex tika]$ {noformat} Here are the surefire reports - no clue what's up here: {noformat} [mattmann@memex tika]$ more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.152 sec FAILURE! org.mockito.exceptions.verification.TooManyActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 4 times but was 5 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) Caused by: org.mockito.exceptions.cause.UndesiredInvocation: Undesired invocation: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284) at org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) at org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102) at org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133) at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76) at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168722#comment-14168722 ] Hudson commented on TIKA-1422: -- SUCCESS: Integrated in tika-trunk-jdk1.6 #239 (See [https://builds.apache.org/job/tika-trunk-jdk1.6/239/]) Fix for TIKA-1422 contributed by tpalsulich and mattmann. (mattmann: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1631206) * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRTest.java org.apache.tika.parser.mail.RFC822ParserTest fails -- Key: TIKA-1422 URL: https://issues.apache.org/jira/browse/TIKA-1422 Project: Tika Issue Type: Bug Components: parser Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Fix For: 1.7 Attachments: TIKA-1422.Mattmann.100114.patch.txt, TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch I'm seeing test failures from: {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 {noformat} CentOS6 VM image, running: {noformat} [mattmann@memex tika]$ java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) [mattmann@memex tika]$ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T09:37:52-08:00) Maven home: /usr/share/apache-maven Java version: 1.7.0_65, vendor: Oracle Corporation Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: amd64, family: unix [mattmann@memex tika]$ {noformat} Here are the surefire reports - no clue what's up here: {noformat} [mattmann@memex tika]$ more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.152 sec FAILURE! org.mockito.exceptions.verification.TooManyActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 4 times but was 5 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) Caused by: org.mockito.exceptions.cause.UndesiredInvocation: Undesired invocation: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284) at org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) at org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102) at org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133) at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76) at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166903#comment-14166903 ] Chris A. Mattmann commented on TIKA-1422: - ok figured it out. It was _this_ part of your combined patch: {noformat} @@ -139,8 +157,7 @@ if (!ExternalParser.check(checkCmd)) return; XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata); - xhtml.startDocument(); - + TemporaryResources tmp = new TemporaryResources(); File output = null; try { @@ -167,7 +184,6 @@ output.delete(); } -xhtml.endDocument(); } {noformat} and then this part: {noformat} @@ -241,19 +257,21 @@ * @throws IOException if an input error occurred */ private void extractOutput(InputStream stream, XHTMLContentHandler xhtml) -throws SAXException, IOException { - + throws SAXException, IOException { + Reader reader = new InputStreamReader(stream, UTF-8); +xhtml.startDocument(); +xhtml.startElement(div); try { -xhtml.startElement(div); char[] buffer = new char[1024]; for (int n = reader.read(buffer); n != -1; n = reader.read(buffer)) { -xhtml.characters(buffer, 0, n); +if (n 0) xhtml.characters(buffer, 0, n); } -xhtml.endElement(div); } finally { reader.close(); } +xhtml.endElement(div); +xhtml.endDocument(); } {noformat} That fixed it. That portion above causes the XHTML handler to *only* be invoked when there *is* actual output, getting back to 4 times no matter what. org.apache.tika.parser.mail.RFC822ParserTest fails -- Key: TIKA-1422 URL: https://issues.apache.org/jira/browse/TIKA-1422 Project: Tika Issue Type: Bug Components: parser Reporter: Chris A. Mattmann Fix For: 1.7 Attachments: TIKA-1422.Mattmann.100114.patch.txt, TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch I'm seeing test failures from: {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 {noformat} CentOS6 VM image, running: {noformat} [mattmann@memex tika]$ java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) [mattmann@memex tika]$ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T09:37:52-08:00) Maven home: /usr/share/apache-maven Java version: 1.7.0_65, vendor: Oracle Corporation Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: amd64, family: unix [mattmann@memex tika]$ {noformat} Here are the surefire reports - no clue what's up here: {noformat} [mattmann@memex tika]$ more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.152 sec FAILURE! org.mockito.exceptions.verification.TooManyActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 4 times but was 5 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) Caused by: org.mockito.exceptions.cause.UndesiredInvocation: Undesired invocation: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166360#comment-14166360 ] Chris A. Mattmann commented on TIKA-1422: - Looks like it was 4 times now with Tesseract installed? Huh? org.apache.tika.parser.mail.RFC822ParserTest fails -- Key: TIKA-1422 URL: https://issues.apache.org/jira/browse/TIKA-1422 Project: Tika Issue Type: Bug Components: parser Reporter: Chris A. Mattmann Fix For: 1.7 Attachments: TIKA-1422.Mattmann.100114.patch.txt, TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch I'm seeing test failures from: {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 {noformat} CentOS6 VM image, running: {noformat} [mattmann@memex tika]$ java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) [mattmann@memex tika]$ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T09:37:52-08:00) Maven home: /usr/share/apache-maven Java version: 1.7.0_65, vendor: Oracle Corporation Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: amd64, family: unix [mattmann@memex tika]$ {noformat} Here are the surefire reports - no clue what's up here: {noformat} [mattmann@memex tika]$ more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.152 sec FAILURE! org.mockito.exceptions.verification.TooManyActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 4 times but was 5 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) Caused by: org.mockito.exceptions.cause.UndesiredInvocation: Undesired invocation: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284) at org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) at org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102) at org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133) at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76) at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166359#comment-14166359 ] Chris A. Mattmann commented on TIKA-1422: - Hey [~tpalsulich] the combined patch is failing for me on my Mac with Tesseract installed: {noformat} [chipotle:~/src/tika] mattmann% tesseract Usage:tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile...] pagesegmode values are: 0 = Orientation and script detection (OSD) only. 1 = Automatic page segmentation with OSD. 2 = Automatic page segmentation, but no OSD, or OCR 3 = Fully automatic page segmentation, but no OSD. (Default) 4 = Assume a single column of text of variable sizes. 5 = Assume a single uniform block of vertically aligned text. 6 = Assume a single uniform block of text. 7 = Treat the image as a single text line. 8 = Treat the image as a single word. 9 = Treat the image as a single word in a circle. 10 = Treat the image as a single character. -l lang and/or -psm pagesegmode must occur before anyconfigfile. Single options: -v --version: version info --list-langs: list available languages for tesseract engine [chipotle:~/src/tika] mattmann% tesseract {noformat} {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 542, Failures: 1, Errors: 0, Skipped: 1 [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Tika parent . SUCCESS [ 2.176 s] [INFO] Apache Tika core ... SUCCESS [ 16.806 s] [INFO] Apache Tika parsers FAILURE [01:32 min] [INFO] Apache Tika XMP SKIPPED [INFO] Apache Tika serialization .. SKIPPED [INFO] Apache Tika application SKIPPED [INFO] Apache Tika OSGi bundle SKIPPED [INFO] Apache Tika server . SKIPPED [INFO] Apache Tika translate .. SKIPPED [INFO] Apache Tika examples ... SKIPPED [INFO] Apache Tika Java-7 Components .. SKIPPED [INFO] Apache Tika SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 01:52 min [INFO] Finished at: 2014-10-09T20:59:49-07:00 [INFO] Final Memory: 34M/178M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project tika-parsers: There are test failures. [ERROR] [ERROR] Please refer to /Users/mattmann/src/tika/tika-parsers/target/surefire-reports for the individual test results. [ERROR] - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn goals -rf :tika-parsers [chipotle:~/src/tika] mattmann% svn status ? tika-example/target ? tika-java7/target M tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java M tika-parsers/src/test/java/org/apache/tika/parser/mail/RFC822ParserTest.java M tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRTest.java [chipotle:~/src/tika] mattmann% more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.853 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.235 sec FAILURE! org.mockito.exceptions.verification.TooLittleActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 5 times but was 4 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:91) Caused by: org.mockito.exceptions.cause.TooLittleInvocations: Too little invocations: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163704#comment-14163704 ] Tyler Palsulich commented on TIKA-1422: --- With my patch from yesterday, all tests are passing with or without tesseract on my computer (tesseract 3.03, java 1.7). Can anyone else confirm? org.apache.tika.parser.mail.RFC822ParserTest fails -- Key: TIKA-1422 URL: https://issues.apache.org/jira/browse/TIKA-1422 Project: Tika Issue Type: Bug Components: parser Reporter: Chris A. Mattmann Fix For: 1.7 Attachments: TIKA-1422.Mattmann.100114.patch.txt, TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch I'm seeing test failures from: {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 {noformat} CentOS6 VM image, running: {noformat} [mattmann@memex tika]$ java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) [mattmann@memex tika]$ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T09:37:52-08:00) Maven home: /usr/share/apache-maven Java version: 1.7.0_65, vendor: Oracle Corporation Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: amd64, family: unix [mattmann@memex tika]$ {noformat} Here are the surefire reports - no clue what's up here: {noformat} [mattmann@memex tika]$ more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.152 sec FAILURE! org.mockito.exceptions.verification.TooManyActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 4 times but was 5 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) Caused by: org.mockito.exceptions.cause.UndesiredInvocation: Undesired invocation: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284) at org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) at org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102) at org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133) at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76) at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157068#comment-14157068 ] Tyler Palsulich commented on TIKA-1422: --- [~chrismattmann], I believe that patch fails when Tesseract is not installed. When Tesseract is not installed, the ContentHandler in question is only invoked 4 times. But, when Tesseract is installed, it's invoked 5 times. My first thought was that the Tesseract Parser always invoked the ContentHandler, even if no OCR text was found. But, there *is* OCR text to be found in this test -- several Happy New Year! messages. So, there are a few ways I can see fixing this test: 1. Just remove the offending line in the test. 2. Allow either 4 or 5 invocations of the handler. 3. Check if Tesseract is installed, checking for 4 or 5 invocations based on the result. 4. Update the image used in the test to have no text and update the TesseractParser to only invoke the handler when it finds content. I would like the third option the most, but I don't like the idea of checking for an external dependency in an otherwise unrelated test. On the other hand, it has the advantage of widening the scope of the test as little as possible. I'd like the fourth option the most, but it requires some funky logic in TesseractOCRParser. Thoughts? org.apache.tika.parser.mail.RFC822ParserTest fails -- Key: TIKA-1422 URL: https://issues.apache.org/jira/browse/TIKA-1422 Project: Tika Issue Type: Bug Components: parser Reporter: Chris A. Mattmann Fix For: 1.7 Attachments: TIKA-1422.Mattmann.100114.patch.txt I'm seeing test failures from: {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 {noformat} CentOS6 VM image, running: {noformat} [mattmann@memex tika]$ java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) [mattmann@memex tika]$ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T09:37:52-08:00) Maven home: /usr/share/apache-maven Java version: 1.7.0_65, vendor: Oracle Corporation Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: amd64, family: unix [mattmann@memex tika]$ {noformat} Here are the surefire reports - no clue what's up here: {noformat} [mattmann@memex tika]$ more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.152 sec FAILURE! org.mockito.exceptions.verification.TooManyActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 4 times but was 5 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) Caused by: org.mockito.exceptions.cause.UndesiredInvocation: Undesired invocation: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284) at org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243) at
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14146283#comment-14146283 ] Tim Allison commented on TIKA-1422: --- While work is going on to get the TesseractOCRParser tests to pass on systems with and without Tesseract, would it be possible to temporarily ignore or comment out the things that are causing failures so that trunk will build cleanly? I got a clean build if I removed TesseractOCRParser from the services list and commented out this line in TikaMimeTypesTest: {noformat} assertEquals(org.apache.tika.parser.ocr.TesseractOCRParser, bmp.get(parser)); {noformat} To be clear, I'm extremely grateful for all of the work that has gone into integrating OCR, and apologies if you are just about to commit the fixes! org.apache.tika.parser.mail.RFC822ParserTest fails -- Key: TIKA-1422 URL: https://issues.apache.org/jira/browse/TIKA-1422 Project: Tika Issue Type: Bug Components: parser Reporter: Chris A. Mattmann Fix For: 1.7 I'm seeing test failures from: {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 {noformat} CentOS6 VM image, running: {noformat} [mattmann@memex tika]$ java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) [mattmann@memex tika]$ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T09:37:52-08:00) Maven home: /usr/share/apache-maven Java version: 1.7.0_65, vendor: Oracle Corporation Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: amd64, family: unix [mattmann@memex tika]$ {noformat} Here are the surefire reports - no clue what's up here: {noformat} [mattmann@memex tika]$ more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.152 sec FAILURE! org.mockito.exceptions.verification.TooManyActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 4 times but was 5 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) Caused by: org.mockito.exceptions.cause.UndesiredInvocation: Undesired invocation: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284) at org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) at org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102) at org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133) at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76) at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14146523#comment-14146523 ] Tyler Palsulich commented on TIKA-1422: --- The Hudson builds are now stable with the fix from TIKA-1421. So, this is only a failure when Tesseract is installed. It has something to do with how attachments are parsed, but I'm not sure exactly what this test is or why it's failing. As I understand it, there are 4 invocations of the handler without Tesseract installed and 5 with. So, it may not be an actual problem... But, if you think we should disable it temporarily, that's fine by me! We could also comment out the failing Assert in this test. org.apache.tika.parser.mail.RFC822ParserTest fails -- Key: TIKA-1422 URL: https://issues.apache.org/jira/browse/TIKA-1422 Project: Tika Issue Type: Bug Components: parser Reporter: Chris A. Mattmann Fix For: 1.7 I'm seeing test failures from: {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 {noformat} CentOS6 VM image, running: {noformat} [mattmann@memex tika]$ java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) [mattmann@memex tika]$ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T09:37:52-08:00) Maven home: /usr/share/apache-maven Java version: 1.7.0_65, vendor: Oracle Corporation Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: amd64, family: unix [mattmann@memex tika]$ {noformat} Here are the surefire reports - no clue what's up here: {noformat} [mattmann@memex tika]$ more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.152 sec FAILURE! org.mockito.exceptions.verification.TooManyActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 4 times but was 5 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) Caused by: org.mockito.exceptions.cause.UndesiredInvocation: Undesired invocation: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284) at org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) at org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102) at org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133) at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76) at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14146537#comment-14146537 ] Tim Allison commented on TIKA-1422: --- Sorry, user error. Needed to force update. Thank you! org.apache.tika.parser.mail.RFC822ParserTest fails -- Key: TIKA-1422 URL: https://issues.apache.org/jira/browse/TIKA-1422 Project: Tika Issue Type: Bug Components: parser Reporter: Chris A. Mattmann Fix For: 1.7 I'm seeing test failures from: {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 {noformat} CentOS6 VM image, running: {noformat} [mattmann@memex tika]$ java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) [mattmann@memex tika]$ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T09:37:52-08:00) Maven home: /usr/share/apache-maven Java version: 1.7.0_65, vendor: Oracle Corporation Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: amd64, family: unix [mattmann@memex tika]$ {noformat} Here are the surefire reports - no clue what's up here: {noformat} [mattmann@memex tika]$ more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.152 sec FAILURE! org.mockito.exceptions.verification.TooManyActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 4 times but was 5 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) Caused by: org.mockito.exceptions.cause.UndesiredInvocation: Undesired invocation: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284) at org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) at org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102) at org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133) at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76) at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145218#comment-14145218 ] Chris A. Mattmann commented on TIKA-1422: - hmm, good idea Luis. I'll try and explore that in my patch here. org.apache.tika.parser.mail.RFC822ParserTest fails -- Key: TIKA-1422 URL: https://issues.apache.org/jira/browse/TIKA-1422 Project: Tika Issue Type: Bug Components: parser Reporter: Chris A. Mattmann Fix For: 1.7 I'm seeing test failures from: {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 {noformat} CentOS6 VM image, running: {noformat} [mattmann@memex tika]$ java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) [mattmann@memex tika]$ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T09:37:52-08:00) Maven home: /usr/share/apache-maven Java version: 1.7.0_65, vendor: Oracle Corporation Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: amd64, family: unix [mattmann@memex tika]$ {noformat} Here are the surefire reports - no clue what's up here: {noformat} [mattmann@memex tika]$ more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.152 sec FAILURE! org.mockito.exceptions.verification.TooManyActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 4 times but was 5 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) Caused by: org.mockito.exceptions.cause.UndesiredInvocation: Undesired invocation: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284) at org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) at org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102) at org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133) at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76) at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145244#comment-14145244 ] Luis Filipe Nassif commented on TIKA-1422: -- Also, despite the tests, currently trunk will afect users that look for image metadata only, because now TesseractOCRParser is the deafult image parser and it does not extract metadata. That approach will resolve this problem too. Another approach is to remove TesseractParser from the default service provider parser list and to call it from the existing image parsers if a TesseractOCRConfig object is found into the parseContext object. It is strange to see a TesseractOCRParser in the trace if the user only wants image metadata. org.apache.tika.parser.mail.RFC822ParserTest fails -- Key: TIKA-1422 URL: https://issues.apache.org/jira/browse/TIKA-1422 Project: Tika Issue Type: Bug Components: parser Reporter: Chris A. Mattmann Fix For: 1.7 I'm seeing test failures from: {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 {noformat} CentOS6 VM image, running: {noformat} [mattmann@memex tika]$ java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) [mattmann@memex tika]$ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T09:37:52-08:00) Maven home: /usr/share/apache-maven Java version: 1.7.0_65, vendor: Oracle Corporation Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: amd64, family: unix [mattmann@memex tika]$ {noformat} Here are the surefire reports - no clue what's up here: {noformat} [mattmann@memex tika]$ more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.152 sec FAILURE! org.mockito.exceptions.verification.TooManyActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 4 times but was 5 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) Caused by: org.mockito.exceptions.cause.UndesiredInvocation: Undesired invocation: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284) at org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) at org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102) at org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133) at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76) at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145253#comment-14145253 ] Chris A. Mattmann commented on TIKA-1422: - I prefer to keep it as an available parser, so I'll have it delegate and call the image parsers to grab the metadata as well. THanks for the suggestion Luis! org.apache.tika.parser.mail.RFC822ParserTest fails -- Key: TIKA-1422 URL: https://issues.apache.org/jira/browse/TIKA-1422 Project: Tika Issue Type: Bug Components: parser Reporter: Chris A. Mattmann Fix For: 1.7 I'm seeing test failures from: {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 {noformat} CentOS6 VM image, running: {noformat} [mattmann@memex tika]$ java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) [mattmann@memex tika]$ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T09:37:52-08:00) Maven home: /usr/share/apache-maven Java version: 1.7.0_65, vendor: Oracle Corporation Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: amd64, family: unix [mattmann@memex tika]$ {noformat} Here are the surefire reports - no clue what's up here: {noformat} [mattmann@memex tika]$ more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.152 sec FAILURE! org.mockito.exceptions.verification.TooManyActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 4 times but was 5 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) Caused by: org.mockito.exceptions.cause.UndesiredInvocation: Undesired invocation: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284) at org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) at org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102) at org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133) at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76) at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at
[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145299#comment-14145299 ] Tyler Palsulich commented on TIKA-1422: --- This assumes that if a user has Tesseract installed they want to run it on the image. Which, may not be true. But, I suppose they could call ImageParser directly if they want the performance boost. +1 to delegating to ImageParser to parse the image metadata from TesseractOCRParser. org.apache.tika.parser.mail.RFC822ParserTest fails -- Key: TIKA-1422 URL: https://issues.apache.org/jira/browse/TIKA-1422 Project: Tika Issue Type: Bug Components: parser Reporter: Chris A. Mattmann Fix For: 1.7 I'm seeing test failures from: {noformat} Results : Failed tests: testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): (..) Tests run: 538, Failures: 1, Errors: 0, Skipped: 1 {noformat} CentOS6 VM image, running: {noformat} [mattmann@memex tika]$ java -version java version 1.7.0_67 Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) [mattmann@memex tika]$ mvn -version Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-14T09:37:52-08:00) Maven home: /usr/share/apache-maven Java version: 1.7.0_65, vendor: Oracle Corporation Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: amd64, family: unix [mattmann@memex tika]$ {noformat} Here are the surefire reports - no clue what's up here: {noformat} [mattmann@memex tika]$ more tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt --- Test set: org.apache.tika.parser.mail.RFC822ParserTest --- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec FAILURE! testMultipart(org.apache.tika.parser.mail.RFC822ParserTest) Time elapsed: 0.152 sec FAILURE! org.mockito.exceptions.verification.TooManyActualInvocations: xHTMLContentHandler.startElement( http://www.w3.org/1999/xhtml;, div, div, isA(org.xml.sax.Attributes) ); Wanted 4 times but was 5 at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87) Caused by: org.mockito.exceptions.cause.UndesiredInvocation: Undesired invocation: at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254) at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284) at org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243) at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) at org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102) at org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133) at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76) at org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at