[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2017-02-08 Thread Sandeepan (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858006#comment-15858006
 ] 

Sandeepan commented on TIKA-1422:
-

[~thaichat04] I am also getting different result when using Tesseract through 
tike across Mac/Ubuntu. From command line, it give same result on both the 
platforms. Were you able to find the reason?

> org.apache.tika.parser.mail.RFC822ParserTest fails
> --
>
> Key: TIKA-1422
> URL: https://issues.apache.org/jira/browse/TIKA-1422
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Reporter: Chris A. Mattmann
>Assignee: Chris A. Mattmann
>  Labels: memex
> Fix For: 1.7
>
> Attachments: TIKA-1422.Mattmann.100114.patch.txt, 
> TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.oleg.20141021.patch, 
> TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch
>
>
> I'm seeing test failures from:
> {noformat}
> Results :
> Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
> (..)
> Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
> {noformat}
> CentOS6 VM image, running:
> {noformat}
> [mattmann@memex tika]$ java -version
> java version "1.7.0_67"
> Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
> Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
> [mattmann@memex tika]$ mvn -version
> Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
> 2014-02-14T09:37:52-08:00)
> Maven home: /usr/share/apache-maven
> Java version: 1.7.0_65, vendor: Oracle Corporation
> Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "linux", version: "2.6.32-431.23.3.el6.centos.plus.x86_64", arch: 
> "amd64", family: "unix"
> [mattmann@memex tika]$ 
> {noformat}
> Here are the surefire reports - no clue what's up here:
> {noformat}
> [mattmann@memex tika]$ more 
> tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
>  
> ---
> Test set: org.apache.tika.parser.mail.RFC822ParserTest
> ---
> Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec <<< 
> FAILURE!
> testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
> 0.152 sec  <<< FAILURE!
> org.mockito.exceptions.verification.TooManyActualInvocations: 
> xHTMLContentHandler.startElement(
> "http://www.w3.org/1999/xhtml;,
> "div",
> "div",
> isA(org.xml.sax.Attributes)
> );
> Wanted 4 times but was 5
>   at 
> org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
> Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
> Undesired invocation:
>   at 
> org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
>   at 
> org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
>   at 
> org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
>   at 
> org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
>   at 
> org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
>   at 
> org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
>   at 
> org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
>   at 
> org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
>   at 
> org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
>   at 
> org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
>   at 
> org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
>   at 
> org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
>   at 
> org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243)
>   at 
> org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
>   at 
> org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102)
>   at 
> org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
>   at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
>   at 
> org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> 

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183103#comment-14183103
 ] 

Hudson commented on TIKA-1422:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #282 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/282/])
TIKA-1422. Skip checking the number of some handler invocations in the 
RFC822ParserTest if Tesseract is installed. (tpalsulich: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1634094)
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/mail/RFC822ParserTest.java


 org.apache.tika.parser.mail.RFC822ParserTest fails
 --

 Key: TIKA-1422
 URL: https://issues.apache.org/jira/browse/TIKA-1422
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
 Fix For: 1.7

 Attachments: TIKA-1422.Mattmann.100114.patch.txt, 
 TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.oleg.20141021.patch, 
 TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch


 I'm seeing test failures from:
 {noformat}
 Results :
 Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
 (..)
 Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
 {noformat}
 CentOS6 VM image, running:
 {noformat}
 [mattmann@memex tika]$ java -version
 java version 1.7.0_67
 Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
 [mattmann@memex tika]$ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T09:37:52-08:00)
 Maven home: /usr/share/apache-maven
 Java version: 1.7.0_65, vendor: Oracle Corporation
 Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: 
 amd64, family: unix
 [mattmann@memex tika]$ 
 {noformat}
 Here are the surefire reports - no clue what's up here:
 {noformat}
 [mattmann@memex tika]$ more 
 tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
  
 ---
 Test set: org.apache.tika.parser.mail.RFC822ParserTest
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec  
 FAILURE!
 testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
 0.152 sec   FAILURE!
 org.mockito.exceptions.verification.TooManyActualInvocations: 
 xHTMLContentHandler.startElement(
 http://www.w3.org/1999/xhtml;,
 div,
 div,
 isA(org.xml.sax.Attributes)
 );
 Wanted 4 times but was 5
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
 Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
 Undesired invocation:
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   at 
 org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102)
   at 
 org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
   at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183161#comment-14183161
 ] 

Hudson commented on TIKA-1422:
--

SUCCESS: Integrated in tika-trunk-jdk1.6 #262 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.6/262/])
TIKA-1422. Skip checking the number of some handler invocations in the 
RFC822ParserTest if Tesseract is installed. (tpalsulich: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1634094)
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/mail/RFC822ParserTest.java


 org.apache.tika.parser.mail.RFC822ParserTest fails
 --

 Key: TIKA-1422
 URL: https://issues.apache.org/jira/browse/TIKA-1422
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
 Fix For: 1.7

 Attachments: TIKA-1422.Mattmann.100114.patch.txt, 
 TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.oleg.20141021.patch, 
 TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch


 I'm seeing test failures from:
 {noformat}
 Results :
 Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
 (..)
 Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
 {noformat}
 CentOS6 VM image, running:
 {noformat}
 [mattmann@memex tika]$ java -version
 java version 1.7.0_67
 Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
 [mattmann@memex tika]$ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T09:37:52-08:00)
 Maven home: /usr/share/apache-maven
 Java version: 1.7.0_65, vendor: Oracle Corporation
 Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: 
 amd64, family: unix
 [mattmann@memex tika]$ 
 {noformat}
 Here are the surefire reports - no clue what's up here:
 {noformat}
 [mattmann@memex tika]$ more 
 tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
  
 ---
 Test set: org.apache.tika.parser.mail.RFC822ParserTest
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec  
 FAILURE!
 testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
 0.152 sec   FAILURE!
 org.mockito.exceptions.verification.TooManyActualInvocations: 
 xHTMLContentHandler.startElement(
 http://www.w3.org/1999/xhtml;,
 div,
 div,
 isA(org.xml.sax.Attributes)
 );
 Wanted 4 times but was 5
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
 Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
 Undesired invocation:
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   at 
 org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102)
   at 
 org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
   at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-21 Thread Hong-Thai Nguyen (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178186#comment-14178186
 ] 

Hong-Thai Nguyen commented on TIKA-1422:


Applied latest fix on r1633325 with some formatting. Thank

 org.apache.tika.parser.mail.RFC822ParserTest fails
 --

 Key: TIKA-1422
 URL: https://issues.apache.org/jira/browse/TIKA-1422
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
 Fix For: 1.7

 Attachments: TIKA-1422.Mattmann.100114.patch.txt, 
 TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.oleg.20141021.patch, 
 TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch


 I'm seeing test failures from:
 {noformat}
 Results :
 Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
 (..)
 Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
 {noformat}
 CentOS6 VM image, running:
 {noformat}
 [mattmann@memex tika]$ java -version
 java version 1.7.0_67
 Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
 [mattmann@memex tika]$ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T09:37:52-08:00)
 Maven home: /usr/share/apache-maven
 Java version: 1.7.0_65, vendor: Oracle Corporation
 Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: 
 amd64, family: unix
 [mattmann@memex tika]$ 
 {noformat}
 Here are the surefire reports - no clue what's up here:
 {noformat}
 [mattmann@memex tika]$ more 
 tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
  
 ---
 Test set: org.apache.tika.parser.mail.RFC822ParserTest
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec  
 FAILURE!
 testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
 0.152 sec   FAILURE!
 org.mockito.exceptions.verification.TooManyActualInvocations: 
 xHTMLContentHandler.startElement(
 http://www.w3.org/1999/xhtml;,
 div,
 div,
 isA(org.xml.sax.Attributes)
 );
 Wanted 4 times but was 5
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
 Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
 Undesired invocation:
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   at 
 org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102)
   at 
 org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
   at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178197#comment-14178197
 ] 

Hudson commented on TIKA-1422:
--

FAILURE: Integrated in tika-trunk-jdk1.7 #273 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/273/])
TIKA-1422 - Apply fix of [~olegt] in Windows (thaichat04: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1633325)
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/mail/RFC822ParserTest.java


 org.apache.tika.parser.mail.RFC822ParserTest fails
 --

 Key: TIKA-1422
 URL: https://issues.apache.org/jira/browse/TIKA-1422
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
 Fix For: 1.7

 Attachments: TIKA-1422.Mattmann.100114.patch.txt, 
 TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.oleg.20141021.patch, 
 TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch


 I'm seeing test failures from:
 {noformat}
 Results :
 Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
 (..)
 Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
 {noformat}
 CentOS6 VM image, running:
 {noformat}
 [mattmann@memex tika]$ java -version
 java version 1.7.0_67
 Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
 [mattmann@memex tika]$ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T09:37:52-08:00)
 Maven home: /usr/share/apache-maven
 Java version: 1.7.0_65, vendor: Oracle Corporation
 Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: 
 amd64, family: unix
 [mattmann@memex tika]$ 
 {noformat}
 Here are the surefire reports - no clue what's up here:
 {noformat}
 [mattmann@memex tika]$ more 
 tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
  
 ---
 Test set: org.apache.tika.parser.mail.RFC822ParserTest
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec  
 FAILURE!
 testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
 0.152 sec   FAILURE!
 org.mockito.exceptions.verification.TooManyActualInvocations: 
 xHTMLContentHandler.startElement(
 http://www.w3.org/1999/xhtml;,
 div,
 div,
 isA(org.xml.sax.Attributes)
 );
 Wanted 4 times but was 5
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
 Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
 Undesired invocation:
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   at 
 org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102)
   at 
 org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
   at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84)
   at 

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178226#comment-14178226
 ] 

Hudson commented on TIKA-1422:
--

SUCCESS: Integrated in tika-trunk-jdk1.6 #253 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.6/253/])
TIKA-1422 - Fixing build  minor refactory of naming test class (thaichat04: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=161)
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/mail/RFC822ParserTest.java
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRParserTest.java
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRTest.java
TIKA-1422 - Apply fix of [~olegt] in Windows (thaichat04: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1633325)
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/mail/RFC822ParserTest.java


 org.apache.tika.parser.mail.RFC822ParserTest fails
 --

 Key: TIKA-1422
 URL: https://issues.apache.org/jira/browse/TIKA-1422
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
 Fix For: 1.7

 Attachments: TIKA-1422.Mattmann.100114.patch.txt, 
 TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.oleg.20141021.patch, 
 TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch


 I'm seeing test failures from:
 {noformat}
 Results :
 Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
 (..)
 Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
 {noformat}
 CentOS6 VM image, running:
 {noformat}
 [mattmann@memex tika]$ java -version
 java version 1.7.0_67
 Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
 [mattmann@memex tika]$ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T09:37:52-08:00)
 Maven home: /usr/share/apache-maven
 Java version: 1.7.0_65, vendor: Oracle Corporation
 Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: 
 amd64, family: unix
 [mattmann@memex tika]$ 
 {noformat}
 Here are the surefire reports - no clue what's up here:
 {noformat}
 [mattmann@memex tika]$ more 
 tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
  
 ---
 Test set: org.apache.tika.parser.mail.RFC822ParserTest
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec  
 FAILURE!
 testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
 0.152 sec   FAILURE!
 org.mockito.exceptions.verification.TooManyActualInvocations: 
 xHTMLContentHandler.startElement(
 http://www.w3.org/1999/xhtml;,
 div,
 div,
 isA(org.xml.sax.Attributes)
 );
 Wanted 4 times but was 5
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
 Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
 Undesired invocation:
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155)
   at 
 

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178257#comment-14178257
 ] 

Hudson commented on TIKA-1422:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #274 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/274/])
TIKA-1422 - Fixing build  minor refactory of naming test class (thaichat04: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=161)
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/mail/RFC822ParserTest.java
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRParserTest.java
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRTest.java


 org.apache.tika.parser.mail.RFC822ParserTest fails
 --

 Key: TIKA-1422
 URL: https://issues.apache.org/jira/browse/TIKA-1422
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
 Fix For: 1.7

 Attachments: TIKA-1422.Mattmann.100114.patch.txt, 
 TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.oleg.20141021.patch, 
 TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch


 I'm seeing test failures from:
 {noformat}
 Results :
 Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
 (..)
 Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
 {noformat}
 CentOS6 VM image, running:
 {noformat}
 [mattmann@memex tika]$ java -version
 java version 1.7.0_67
 Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
 [mattmann@memex tika]$ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T09:37:52-08:00)
 Maven home: /usr/share/apache-maven
 Java version: 1.7.0_65, vendor: Oracle Corporation
 Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: 
 amd64, family: unix
 [mattmann@memex tika]$ 
 {noformat}
 Here are the surefire reports - no clue what's up here:
 {noformat}
 [mattmann@memex tika]$ more 
 tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
  
 ---
 Test set: org.apache.tika.parser.mail.RFC822ParserTest
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec  
 FAILURE!
 testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
 0.152 sec   FAILURE!
 org.mockito.exceptions.verification.TooManyActualInvocations: 
 xHTMLContentHandler.startElement(
 http://www.w3.org/1999/xhtml;,
 div,
 div,
 isA(org.xml.sax.Attributes)
 );
 Wanted 4 times but was 5
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
 Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
 Undesired invocation:
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   at 
 org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102)
   at 
 org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
   at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
   at 
 

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-21 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178866#comment-14178866
 ] 

Tyler Palsulich commented on TIKA-1422:
---

{code}
Results :

Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
(..)

Tests run: 546, Failures: 1, Errors: 0, Skipped: 4
{code}
{code}
Wanted 5 times but was 4
at 
org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:93)
Caused by: org.mockito.exceptions.cause.TooLittleInvocations:
{code}

Still getting a failing test with Tesseract 3.02.02 installed on Mac. Will look 
into this more tomorrow. But, thank you, [~o...@apache.org] and [~thaichat04]!

 org.apache.tika.parser.mail.RFC822ParserTest fails
 --

 Key: TIKA-1422
 URL: https://issues.apache.org/jira/browse/TIKA-1422
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
 Fix For: 1.7

 Attachments: TIKA-1422.Mattmann.100114.patch.txt, 
 TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.oleg.20141021.patch, 
 TIKA-1422.palsulich.100414.patch, TIKA-1422.palsulich.100714.patch


 I'm seeing test failures from:
 {noformat}
 Results :
 Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
 (..)
 Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
 {noformat}
 CentOS6 VM image, running:
 {noformat}
 [mattmann@memex tika]$ java -version
 java version 1.7.0_67
 Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
 [mattmann@memex tika]$ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T09:37:52-08:00)
 Maven home: /usr/share/apache-maven
 Java version: 1.7.0_65, vendor: Oracle Corporation
 Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: 
 amd64, family: unix
 [mattmann@memex tika]$ 
 {noformat}
 Here are the surefire reports - no clue what's up here:
 {noformat}
 [mattmann@memex tika]$ more 
 tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
  
 ---
 Test set: org.apache.tika.parser.mail.RFC822ParserTest
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec  
 FAILURE!
 testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
 0.152 sec   FAILURE!
 org.mockito.exceptions.verification.TooManyActualInvocations: 
 xHTMLContentHandler.startElement(
 http://www.w3.org/1999/xhtml;,
 div,
 div,
 isA(org.xml.sax.Attributes)
 );
 Wanted 4 times but was 5
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
 Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
 Undesired invocation:
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   at 
 org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102)
   at 
 org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
   at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
   at 
 

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-16 Thread Hong-Thai Nguyen (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173537#comment-14173537
 ] 

Hong-Thai Nguyen commented on TIKA-1422:


I'm not using Tesseract

 org.apache.tika.parser.mail.RFC822ParserTest fails
 --

 Key: TIKA-1422
 URL: https://issues.apache.org/jira/browse/TIKA-1422
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
 Fix For: 1.7

 Attachments: TIKA-1422.Mattmann.100114.patch.txt, 
 TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.palsulich.100414.patch, 
 TIKA-1422.palsulich.100714.patch


 I'm seeing test failures from:
 {noformat}
 Results :
 Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
 (..)
 Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
 {noformat}
 CentOS6 VM image, running:
 {noformat}
 [mattmann@memex tika]$ java -version
 java version 1.7.0_67
 Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
 [mattmann@memex tika]$ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T09:37:52-08:00)
 Maven home: /usr/share/apache-maven
 Java version: 1.7.0_65, vendor: Oracle Corporation
 Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: 
 amd64, family: unix
 [mattmann@memex tika]$ 
 {noformat}
 Here are the surefire reports - no clue what's up here:
 {noformat}
 [mattmann@memex tika]$ more 
 tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
  
 ---
 Test set: org.apache.tika.parser.mail.RFC822ParserTest
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec  
 FAILURE!
 testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
 0.152 sec   FAILURE!
 org.mockito.exceptions.verification.TooManyActualInvocations: 
 xHTMLContentHandler.startElement(
 http://www.w3.org/1999/xhtml;,
 div,
 div,
 isA(org.xml.sax.Attributes)
 );
 Wanted 4 times but was 5
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
 Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
 Undesired invocation:
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   at 
 org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102)
   at 
 org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
   at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
   at 
 

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-16 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173839#comment-14173839
 ] 

Tyler Palsulich commented on TIKA-1422:
---

Can you check what {{%ErrorLevel%}} is when you try to run Tesseract from 
command line? 

 org.apache.tika.parser.mail.RFC822ParserTest fails
 --

 Key: TIKA-1422
 URL: https://issues.apache.org/jira/browse/TIKA-1422
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
 Fix For: 1.7

 Attachments: TIKA-1422.Mattmann.100114.patch.txt, 
 TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.palsulich.100414.patch, 
 TIKA-1422.palsulich.100714.patch


 I'm seeing test failures from:
 {noformat}
 Results :
 Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
 (..)
 Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
 {noformat}
 CentOS6 VM image, running:
 {noformat}
 [mattmann@memex tika]$ java -version
 java version 1.7.0_67
 Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
 [mattmann@memex tika]$ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T09:37:52-08:00)
 Maven home: /usr/share/apache-maven
 Java version: 1.7.0_65, vendor: Oracle Corporation
 Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: 
 amd64, family: unix
 [mattmann@memex tika]$ 
 {noformat}
 Here are the surefire reports - no clue what's up here:
 {noformat}
 [mattmann@memex tika]$ more 
 tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
  
 ---
 Test set: org.apache.tika.parser.mail.RFC822ParserTest
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec  
 FAILURE!
 testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
 0.152 sec   FAILURE!
 org.mockito.exceptions.verification.TooManyActualInvocations: 
 xHTMLContentHandler.startElement(
 http://www.w3.org/1999/xhtml;,
 div,
 div,
 isA(org.xml.sax.Attributes)
 );
 Wanted 4 times but was 5
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
 Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
 Undesired invocation:
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   at 
 org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102)
   at 
 org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
   at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-15 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14172796#comment-14172796
 ] 

Tyler Palsulich commented on TIKA-1422:
---

What version of Tesseract do you have installed, [~thaichat04]? I'm also 
getting different results between Mac/Windows/Ubuntu.

 org.apache.tika.parser.mail.RFC822ParserTest fails
 --

 Key: TIKA-1422
 URL: https://issues.apache.org/jira/browse/TIKA-1422
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
 Fix For: 1.7

 Attachments: TIKA-1422.Mattmann.100114.patch.txt, 
 TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.palsulich.100414.patch, 
 TIKA-1422.palsulich.100714.patch


 I'm seeing test failures from:
 {noformat}
 Results :
 Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
 (..)
 Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
 {noformat}
 CentOS6 VM image, running:
 {noformat}
 [mattmann@memex tika]$ java -version
 java version 1.7.0_67
 Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
 [mattmann@memex tika]$ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T09:37:52-08:00)
 Maven home: /usr/share/apache-maven
 Java version: 1.7.0_65, vendor: Oracle Corporation
 Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: 
 amd64, family: unix
 [mattmann@memex tika]$ 
 {noformat}
 Here are the surefire reports - no clue what's up here:
 {noformat}
 [mattmann@memex tika]$ more 
 tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
  
 ---
 Test set: org.apache.tika.parser.mail.RFC822ParserTest
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec  
 FAILURE!
 testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
 0.152 sec   FAILURE!
 org.mockito.exceptions.verification.TooManyActualInvocations: 
 xHTMLContentHandler.startElement(
 http://www.w3.org/1999/xhtml;,
 div,
 div,
 isA(org.xml.sax.Attributes)
 );
 Wanted 4 times but was 5
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
 Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
 Undesired invocation:
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   at 
 org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102)
   at 
 org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
   at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-12 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168704#comment-14168704
 ] 

Chris A. Mattmann commented on TIKA-1422:
-

OK I went ahead and created TIKA-1445 on the ImageParser part. Tyler and my 
latest patch seem to fix the test issue. I'm going to commit that to deal with 
TIKA-1422, then deal with the Image parsing met (and take that out of this 
patch) and deal with it in TIKA-1445.

 org.apache.tika.parser.mail.RFC822ParserTest fails
 --

 Key: TIKA-1422
 URL: https://issues.apache.org/jira/browse/TIKA-1422
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Chris A. Mattmann
 Fix For: 1.7

 Attachments: TIKA-1422.Mattmann.100114.patch.txt, 
 TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.palsulich.100414.patch, 
 TIKA-1422.palsulich.100714.patch


 I'm seeing test failures from:
 {noformat}
 Results :
 Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
 (..)
 Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
 {noformat}
 CentOS6 VM image, running:
 {noformat}
 [mattmann@memex tika]$ java -version
 java version 1.7.0_67
 Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
 [mattmann@memex tika]$ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T09:37:52-08:00)
 Maven home: /usr/share/apache-maven
 Java version: 1.7.0_65, vendor: Oracle Corporation
 Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: 
 amd64, family: unix
 [mattmann@memex tika]$ 
 {noformat}
 Here are the surefire reports - no clue what's up here:
 {noformat}
 [mattmann@memex tika]$ more 
 tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
  
 ---
 Test set: org.apache.tika.parser.mail.RFC822ParserTest
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec  
 FAILURE!
 testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
 0.152 sec   FAILURE!
 org.mockito.exceptions.verification.TooManyActualInvocations: 
 xHTMLContentHandler.startElement(
 http://www.w3.org/1999/xhtml;,
 div,
 div,
 isA(org.xml.sax.Attributes)
 );
 Wanted 4 times but was 5
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
 Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
 Undesired invocation:
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   at 
 org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102)
   at 
 org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
   at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-12 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168708#comment-14168708
 ] 

Chris A. Mattmann commented on TIKA-1422:
-

okey dok, I went ahead and committed this in r1631206 since it fixes the test 
with/without Tesseract it seems. I'll address the image parsing issues in 
TIKA-1445. Thanks Tyler and everyone!

 org.apache.tika.parser.mail.RFC822ParserTest fails
 --

 Key: TIKA-1422
 URL: https://issues.apache.org/jira/browse/TIKA-1422
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Chris A. Mattmann
 Fix For: 1.7

 Attachments: TIKA-1422.Mattmann.100114.patch.txt, 
 TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.palsulich.100414.patch, 
 TIKA-1422.palsulich.100714.patch


 I'm seeing test failures from:
 {noformat}
 Results :
 Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
 (..)
 Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
 {noformat}
 CentOS6 VM image, running:
 {noformat}
 [mattmann@memex tika]$ java -version
 java version 1.7.0_67
 Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
 [mattmann@memex tika]$ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T09:37:52-08:00)
 Maven home: /usr/share/apache-maven
 Java version: 1.7.0_65, vendor: Oracle Corporation
 Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: 
 amd64, family: unix
 [mattmann@memex tika]$ 
 {noformat}
 Here are the surefire reports - no clue what's up here:
 {noformat}
 [mattmann@memex tika]$ more 
 tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
  
 ---
 Test set: org.apache.tika.parser.mail.RFC822ParserTest
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec  
 FAILURE!
 testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
 0.152 sec   FAILURE!
 org.mockito.exceptions.verification.TooManyActualInvocations: 
 xHTMLContentHandler.startElement(
 http://www.w3.org/1999/xhtml;,
 div,
 div,
 isA(org.xml.sax.Attributes)
 );
 Wanted 4 times but was 5
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
 Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
 Undesired invocation:
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   at 
 org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102)
   at 
 org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
   at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168713#comment-14168713
 ] 

Hudson commented on TIKA-1422:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #259 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/259/])
Fix for TIKA-1422 contributed by tpalsulich and mattmann. (mattmann: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1631206)
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRTest.java


 org.apache.tika.parser.mail.RFC822ParserTest fails
 --

 Key: TIKA-1422
 URL: https://issues.apache.org/jira/browse/TIKA-1422
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
 Fix For: 1.7

 Attachments: TIKA-1422.Mattmann.100114.patch.txt, 
 TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.palsulich.100414.patch, 
 TIKA-1422.palsulich.100714.patch


 I'm seeing test failures from:
 {noformat}
 Results :
 Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
 (..)
 Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
 {noformat}
 CentOS6 VM image, running:
 {noformat}
 [mattmann@memex tika]$ java -version
 java version 1.7.0_67
 Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
 [mattmann@memex tika]$ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T09:37:52-08:00)
 Maven home: /usr/share/apache-maven
 Java version: 1.7.0_65, vendor: Oracle Corporation
 Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: 
 amd64, family: unix
 [mattmann@memex tika]$ 
 {noformat}
 Here are the surefire reports - no clue what's up here:
 {noformat}
 [mattmann@memex tika]$ more 
 tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
  
 ---
 Test set: org.apache.tika.parser.mail.RFC822ParserTest
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec  
 FAILURE!
 testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
 0.152 sec   FAILURE!
 org.mockito.exceptions.verification.TooManyActualInvocations: 
 xHTMLContentHandler.startElement(
 http://www.w3.org/1999/xhtml;,
 div,
 div,
 isA(org.xml.sax.Attributes)
 );
 Wanted 4 times but was 5
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
 Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
 Undesired invocation:
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   at 
 org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102)
   at 
 org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
   at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168722#comment-14168722
 ] 

Hudson commented on TIKA-1422:
--

SUCCESS: Integrated in tika-trunk-jdk1.6 #239 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.6/239/])
Fix for TIKA-1422 contributed by tpalsulich and mattmann. (mattmann: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1631206)
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRTest.java


 org.apache.tika.parser.mail.RFC822ParserTest fails
 --

 Key: TIKA-1422
 URL: https://issues.apache.org/jira/browse/TIKA-1422
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
 Fix For: 1.7

 Attachments: TIKA-1422.Mattmann.100114.patch.txt, 
 TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.palsulich.100414.patch, 
 TIKA-1422.palsulich.100714.patch


 I'm seeing test failures from:
 {noformat}
 Results :
 Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
 (..)
 Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
 {noformat}
 CentOS6 VM image, running:
 {noformat}
 [mattmann@memex tika]$ java -version
 java version 1.7.0_67
 Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
 [mattmann@memex tika]$ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T09:37:52-08:00)
 Maven home: /usr/share/apache-maven
 Java version: 1.7.0_65, vendor: Oracle Corporation
 Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: 
 amd64, family: unix
 [mattmann@memex tika]$ 
 {noformat}
 Here are the surefire reports - no clue what's up here:
 {noformat}
 [mattmann@memex tika]$ more 
 tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
  
 ---
 Test set: org.apache.tika.parser.mail.RFC822ParserTest
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec  
 FAILURE!
 testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
 0.152 sec   FAILURE!
 org.mockito.exceptions.verification.TooManyActualInvocations: 
 xHTMLContentHandler.startElement(
 http://www.w3.org/1999/xhtml;,
 div,
 div,
 isA(org.xml.sax.Attributes)
 );
 Wanted 4 times but was 5
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
 Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
 Undesired invocation:
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   at 
 org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102)
   at 
 org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
   at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-10 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166903#comment-14166903
 ] 

Chris A. Mattmann commented on TIKA-1422:
-

ok figured it out. It was _this_ part of your combined patch:

{noformat}
@@ -139,8 +157,7 @@
 if (!ExternalParser.check(checkCmd)) return;

XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
-   xhtml.startDocument();
-   
+
 TemporaryResources tmp = new TemporaryResources();
 File output = null;
 try {
@@ -167,7 +184,6 @@
output.delete();
 
 }
-xhtml.endDocument();
 }
 
{noformat}

and then this part:

{noformat}
@@ -241,19 +257,21 @@
  * @throws IOException if an input error occurred
  */
 private void extractOutput(InputStream stream, XHTMLContentHandler xhtml)
-throws SAXException, IOException {
-   
+   throws SAXException, IOException {
+ 
 Reader reader = new InputStreamReader(stream, UTF-8);
+xhtml.startDocument();
+xhtml.startElement(div);
 try {
-xhtml.startElement(div);
 char[] buffer = new char[1024];
 for (int n = reader.read(buffer); n != -1; n = 
reader.read(buffer)) {
-xhtml.characters(buffer, 0, n);
+if (n  0) xhtml.characters(buffer, 0, n);
 }
-xhtml.endElement(div);
 } finally {
 reader.close();
 }
+xhtml.endElement(div);
+xhtml.endDocument();
 }
 
{noformat}

That fixed it. That portion above causes the XHTML handler to *only* be invoked 
when there *is* actual output, getting back to 4 times no matter what.


 org.apache.tika.parser.mail.RFC822ParserTest fails
 --

 Key: TIKA-1422
 URL: https://issues.apache.org/jira/browse/TIKA-1422
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Chris A. Mattmann
 Fix For: 1.7

 Attachments: TIKA-1422.Mattmann.100114.patch.txt, 
 TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.palsulich.100414.patch, 
 TIKA-1422.palsulich.100714.patch


 I'm seeing test failures from:
 {noformat}
 Results :
 Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
 (..)
 Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
 {noformat}
 CentOS6 VM image, running:
 {noformat}
 [mattmann@memex tika]$ java -version
 java version 1.7.0_67
 Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
 [mattmann@memex tika]$ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T09:37:52-08:00)
 Maven home: /usr/share/apache-maven
 Java version: 1.7.0_65, vendor: Oracle Corporation
 Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: 
 amd64, family: unix
 [mattmann@memex tika]$ 
 {noformat}
 Here are the surefire reports - no clue what's up here:
 {noformat}
 [mattmann@memex tika]$ more 
 tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
  
 ---
 Test set: org.apache.tika.parser.mail.RFC822ParserTest
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec  
 FAILURE!
 testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
 0.152 sec   FAILURE!
 org.mockito.exceptions.verification.TooManyActualInvocations: 
 xHTMLContentHandler.startElement(
 http://www.w3.org/1999/xhtml;,
 div,
 div,
 isA(org.xml.sax.Attributes)
 );
 Wanted 4 times but was 5
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
 Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
 Undesired invocation:
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-09 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166360#comment-14166360
 ] 

Chris A. Mattmann commented on TIKA-1422:
-

Looks like it was 4 times now with Tesseract installed? Huh?

 org.apache.tika.parser.mail.RFC822ParserTest fails
 --

 Key: TIKA-1422
 URL: https://issues.apache.org/jira/browse/TIKA-1422
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Chris A. Mattmann
 Fix For: 1.7

 Attachments: TIKA-1422.Mattmann.100114.patch.txt, 
 TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.palsulich.100414.patch, 
 TIKA-1422.palsulich.100714.patch


 I'm seeing test failures from:
 {noformat}
 Results :
 Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
 (..)
 Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
 {noformat}
 CentOS6 VM image, running:
 {noformat}
 [mattmann@memex tika]$ java -version
 java version 1.7.0_67
 Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
 [mattmann@memex tika]$ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T09:37:52-08:00)
 Maven home: /usr/share/apache-maven
 Java version: 1.7.0_65, vendor: Oracle Corporation
 Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: 
 amd64, family: unix
 [mattmann@memex tika]$ 
 {noformat}
 Here are the surefire reports - no clue what's up here:
 {noformat}
 [mattmann@memex tika]$ more 
 tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
  
 ---
 Test set: org.apache.tika.parser.mail.RFC822ParserTest
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec  
 FAILURE!
 testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
 0.152 sec   FAILURE!
 org.mockito.exceptions.verification.TooManyActualInvocations: 
 xHTMLContentHandler.startElement(
 http://www.w3.org/1999/xhtml;,
 div,
 div,
 isA(org.xml.sax.Attributes)
 );
 Wanted 4 times but was 5
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
 Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
 Undesired invocation:
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   at 
 org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102)
   at 
 org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
   at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
   at 
 

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-09 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166359#comment-14166359
 ] 

Chris A. Mattmann commented on TIKA-1422:
-

Hey [~tpalsulich] the combined patch is failing for me on my Mac with Tesseract 
installed:

{noformat}
[chipotle:~/src/tika] mattmann% tesseract
Usage:tesseract imagename outputbase [-l lang] [-psm pagesegmode] 
[configfile...]

pagesegmode values are:
0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR
3 = Fully automatic page segmentation, but no OSD. (Default)
4 = Assume a single column of text of variable sizes.
5 = Assume a single uniform block of vertically aligned text.
6 = Assume a single uniform block of text.
7 = Treat the image as a single text line.
8 = Treat the image as a single word.
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.
-l lang and/or -psm pagesegmode must occur before anyconfigfile.

Single options:
  -v --version: version info
  --list-langs: list available languages for tesseract engine
[chipotle:~/src/tika] mattmann% tesseract

{noformat}

{noformat}

Results :

Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
(..)

Tests run: 542, Failures: 1, Errors: 0, Skipped: 1

[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Tika parent . SUCCESS [  2.176 s]
[INFO] Apache Tika core ... SUCCESS [ 16.806 s]
[INFO] Apache Tika parsers  FAILURE [01:32 min]
[INFO] Apache Tika XMP  SKIPPED
[INFO] Apache Tika serialization .. SKIPPED
[INFO] Apache Tika application  SKIPPED
[INFO] Apache Tika OSGi bundle  SKIPPED
[INFO] Apache Tika server . SKIPPED
[INFO] Apache Tika translate .. SKIPPED
[INFO] Apache Tika examples ... SKIPPED
[INFO] Apache Tika Java-7 Components .. SKIPPED
[INFO] Apache Tika  SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 01:52 min
[INFO] Finished at: 2014-10-09T20:59:49-07:00
[INFO] Final Memory: 34M/178M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on 
project tika-parsers: There are test failures.
[ERROR] 
[ERROR] Please refer to 
/Users/mattmann/src/tika/tika-parsers/target/surefire-reports for the 
individual test results.
[ERROR] - [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn goals -rf :tika-parsers
[chipotle:~/src/tika] mattmann% svn status
?   tika-example/target
?   tika-java7/target
M   
tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java
M   
tika-parsers/src/test/java/org/apache/tika/parser/mail/RFC822ParserTest.java
M   
tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRTest.java
[chipotle:~/src/tika] mattmann% more 
tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
 
---
Test set: org.apache.tika.parser.mail.RFC822ParserTest
---
Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.853 sec  
FAILURE!
testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
0.235 sec   FAILURE!
org.mockito.exceptions.verification.TooLittleActualInvocations: 
xHTMLContentHandler.startElement(
http://www.w3.org/1999/xhtml;,
div,
div,
isA(org.xml.sax.Attributes)
);
Wanted 5 times but was 4
at 
org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:91)
Caused by: org.mockito.exceptions.cause.TooLittleInvocations: 
Too little invocations:
at 
org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
at 

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-08 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163704#comment-14163704
 ] 

Tyler Palsulich commented on TIKA-1422:
---

With my patch from yesterday, all tests are passing with or without tesseract 
on my computer (tesseract 3.03, java 1.7). Can anyone else confirm?

 org.apache.tika.parser.mail.RFC822ParserTest fails
 --

 Key: TIKA-1422
 URL: https://issues.apache.org/jira/browse/TIKA-1422
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Chris A. Mattmann
 Fix For: 1.7

 Attachments: TIKA-1422.Mattmann.100114.patch.txt, 
 TIKA-1422.Mattmann.100414.patch.txt, TIKA-1422.palsulich.100414.patch, 
 TIKA-1422.palsulich.100714.patch


 I'm seeing test failures from:
 {noformat}
 Results :
 Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
 (..)
 Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
 {noformat}
 CentOS6 VM image, running:
 {noformat}
 [mattmann@memex tika]$ java -version
 java version 1.7.0_67
 Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
 [mattmann@memex tika]$ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T09:37:52-08:00)
 Maven home: /usr/share/apache-maven
 Java version: 1.7.0_65, vendor: Oracle Corporation
 Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: 
 amd64, family: unix
 [mattmann@memex tika]$ 
 {noformat}
 Here are the surefire reports - no clue what's up here:
 {noformat}
 [mattmann@memex tika]$ more 
 tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
  
 ---
 Test set: org.apache.tika.parser.mail.RFC822ParserTest
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec  
 FAILURE!
 testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
 0.152 sec   FAILURE!
 org.mockito.exceptions.verification.TooManyActualInvocations: 
 xHTMLContentHandler.startElement(
 http://www.w3.org/1999/xhtml;,
 div,
 div,
 isA(org.xml.sax.Attributes)
 );
 Wanted 4 times but was 5
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
 Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
 Undesired invocation:
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   at 
 org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102)
   at 
 org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
   at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-02 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157068#comment-14157068
 ] 

Tyler Palsulich commented on TIKA-1422:
---

[~chrismattmann], I believe that patch fails when Tesseract is not installed. 
When Tesseract is not installed, the ContentHandler in question is only invoked 
4 times. But, when Tesseract is installed, it's invoked 5 times.

My first thought was that the Tesseract Parser always invoked the 
ContentHandler, even if no OCR text was found. But, there *is* OCR text to be 
found in this test -- several Happy New Year! messages. So, there are a few 
ways I can see fixing this test:

1. Just remove the offending line in the test.
2. Allow either 4 or 5 invocations of the handler.
3. Check if Tesseract is installed, checking for 4 or 5 invocations based on 
the result.
4. Update the image used in the test to have no text and update the 
TesseractParser to only invoke the handler when it finds content.

I would like the third option the most, but I don't like the idea of checking 
for an external dependency in an otherwise unrelated test. On the other hand, 
it has the advantage of widening the scope of the test as little as possible.

I'd like the fourth option the most, but it requires some funky logic in 
TesseractOCRParser.

Thoughts?



 org.apache.tika.parser.mail.RFC822ParserTest fails
 --

 Key: TIKA-1422
 URL: https://issues.apache.org/jira/browse/TIKA-1422
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Chris A. Mattmann
 Fix For: 1.7

 Attachments: TIKA-1422.Mattmann.100114.patch.txt


 I'm seeing test failures from:
 {noformat}
 Results :
 Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
 (..)
 Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
 {noformat}
 CentOS6 VM image, running:
 {noformat}
 [mattmann@memex tika]$ java -version
 java version 1.7.0_67
 Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
 [mattmann@memex tika]$ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T09:37:52-08:00)
 Maven home: /usr/share/apache-maven
 Java version: 1.7.0_65, vendor: Oracle Corporation
 Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: 
 amd64, family: unix
 [mattmann@memex tika]$ 
 {noformat}
 Here are the surefire reports - no clue what's up here:
 {noformat}
 [mattmann@memex tika]$ more 
 tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
  
 ---
 Test set: org.apache.tika.parser.mail.RFC822ParserTest
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec  
 FAILURE!
 testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
 0.152 sec   FAILURE!
 org.mockito.exceptions.verification.TooManyActualInvocations: 
 xHTMLContentHandler.startElement(
 http://www.w3.org/1999/xhtml;,
 div,
 div,
 isA(org.xml.sax.Attributes)
 );
 Wanted 4 times but was 5
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
 Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
 Undesired invocation:
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243)
   at 
 

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-09-24 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14146283#comment-14146283
 ] 

Tim Allison commented on TIKA-1422:
---

While work is going on to get the TesseractOCRParser tests to pass on systems 
with and without Tesseract, would it be possible to temporarily ignore or 
comment out the things that are causing failures so that trunk will build 
cleanly?

I got a clean build if I removed TesseractOCRParser from the services list and 
commented out this line in TikaMimeTypesTest:
{noformat}
  assertEquals(org.apache.tika.parser.ocr.TesseractOCRParser, 
bmp.get(parser));
{noformat}
 
To be clear, I'm extremely grateful for all of the work that has gone into 
integrating OCR, and apologies if you are just about to commit the fixes!

 org.apache.tika.parser.mail.RFC822ParserTest fails
 --

 Key: TIKA-1422
 URL: https://issues.apache.org/jira/browse/TIKA-1422
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Chris A. Mattmann
 Fix For: 1.7


 I'm seeing test failures from:
 {noformat}
 Results :
 Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
 (..)
 Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
 {noformat}
 CentOS6 VM image, running:
 {noformat}
 [mattmann@memex tika]$ java -version
 java version 1.7.0_67
 Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
 [mattmann@memex tika]$ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T09:37:52-08:00)
 Maven home: /usr/share/apache-maven
 Java version: 1.7.0_65, vendor: Oracle Corporation
 Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: 
 amd64, family: unix
 [mattmann@memex tika]$ 
 {noformat}
 Here are the surefire reports - no clue what's up here:
 {noformat}
 [mattmann@memex tika]$ more 
 tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
  
 ---
 Test set: org.apache.tika.parser.mail.RFC822ParserTest
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec  
 FAILURE!
 testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
 0.152 sec   FAILURE!
 org.mockito.exceptions.verification.TooManyActualInvocations: 
 xHTMLContentHandler.startElement(
 http://www.w3.org/1999/xhtml;,
 div,
 div,
 isA(org.xml.sax.Attributes)
 );
 Wanted 4 times but was 5
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
 Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
 Undesired invocation:
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   at 
 org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102)
   at 
 org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
   at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-09-24 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14146523#comment-14146523
 ] 

Tyler Palsulich commented on TIKA-1422:
---

The Hudson builds are now stable with the fix from TIKA-1421. So, this is only 
a failure when Tesseract is installed. It has something to do with how 
attachments are parsed, but I'm not sure exactly what this test is or why it's 
failing. As I understand it, there are 4 invocations of the handler without 
Tesseract installed and 5 with. So, it may not be an actual problem...

But, if you think we should disable it temporarily, that's fine by me! We could 
also comment out the failing Assert in this test.

 org.apache.tika.parser.mail.RFC822ParserTest fails
 --

 Key: TIKA-1422
 URL: https://issues.apache.org/jira/browse/TIKA-1422
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Chris A. Mattmann
 Fix For: 1.7


 I'm seeing test failures from:
 {noformat}
 Results :
 Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
 (..)
 Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
 {noformat}
 CentOS6 VM image, running:
 {noformat}
 [mattmann@memex tika]$ java -version
 java version 1.7.0_67
 Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
 [mattmann@memex tika]$ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T09:37:52-08:00)
 Maven home: /usr/share/apache-maven
 Java version: 1.7.0_65, vendor: Oracle Corporation
 Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: 
 amd64, family: unix
 [mattmann@memex tika]$ 
 {noformat}
 Here are the surefire reports - no clue what's up here:
 {noformat}
 [mattmann@memex tika]$ more 
 tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
  
 ---
 Test set: org.apache.tika.parser.mail.RFC822ParserTest
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec  
 FAILURE!
 testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
 0.152 sec   FAILURE!
 org.mockito.exceptions.verification.TooManyActualInvocations: 
 xHTMLContentHandler.startElement(
 http://www.w3.org/1999/xhtml;,
 div,
 div,
 isA(org.xml.sax.Attributes)
 );
 Wanted 4 times but was 5
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
 Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
 Undesired invocation:
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   at 
 org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102)
   at 
 org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
   at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-09-24 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14146537#comment-14146537
 ] 

Tim Allison commented on TIKA-1422:
---

Sorry, user error.  Needed to force update.  Thank you!

 org.apache.tika.parser.mail.RFC822ParserTest fails
 --

 Key: TIKA-1422
 URL: https://issues.apache.org/jira/browse/TIKA-1422
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Chris A. Mattmann
 Fix For: 1.7


 I'm seeing test failures from:
 {noformat}
 Results :
 Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
 (..)
 Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
 {noformat}
 CentOS6 VM image, running:
 {noformat}
 [mattmann@memex tika]$ java -version
 java version 1.7.0_67
 Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
 [mattmann@memex tika]$ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T09:37:52-08:00)
 Maven home: /usr/share/apache-maven
 Java version: 1.7.0_65, vendor: Oracle Corporation
 Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: 
 amd64, family: unix
 [mattmann@memex tika]$ 
 {noformat}
 Here are the surefire reports - no clue what's up here:
 {noformat}
 [mattmann@memex tika]$ more 
 tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
  
 ---
 Test set: org.apache.tika.parser.mail.RFC822ParserTest
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec  
 FAILURE!
 testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
 0.152 sec   FAILURE!
 org.mockito.exceptions.verification.TooManyActualInvocations: 
 xHTMLContentHandler.startElement(
 http://www.w3.org/1999/xhtml;,
 div,
 div,
 isA(org.xml.sax.Attributes)
 );
 Wanted 4 times but was 5
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
 Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
 Undesired invocation:
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   at 
 org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102)
   at 
 org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
   at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
   at 
 

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-09-23 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145218#comment-14145218
 ] 

Chris A. Mattmann commented on TIKA-1422:
-

hmm, good idea Luis. I'll try and explore that in my patch here. 

 org.apache.tika.parser.mail.RFC822ParserTest fails
 --

 Key: TIKA-1422
 URL: https://issues.apache.org/jira/browse/TIKA-1422
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Chris A. Mattmann
 Fix For: 1.7


 I'm seeing test failures from:
 {noformat}
 Results :
 Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
 (..)
 Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
 {noformat}
 CentOS6 VM image, running:
 {noformat}
 [mattmann@memex tika]$ java -version
 java version 1.7.0_67
 Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
 [mattmann@memex tika]$ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T09:37:52-08:00)
 Maven home: /usr/share/apache-maven
 Java version: 1.7.0_65, vendor: Oracle Corporation
 Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: 
 amd64, family: unix
 [mattmann@memex tika]$ 
 {noformat}
 Here are the surefire reports - no clue what's up here:
 {noformat}
 [mattmann@memex tika]$ more 
 tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
  
 ---
 Test set: org.apache.tika.parser.mail.RFC822ParserTest
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec  
 FAILURE!
 testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
 0.152 sec   FAILURE!
 org.mockito.exceptions.verification.TooManyActualInvocations: 
 xHTMLContentHandler.startElement(
 http://www.w3.org/1999/xhtml;,
 div,
 div,
 isA(org.xml.sax.Attributes)
 );
 Wanted 4 times but was 5
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
 Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
 Undesired invocation:
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   at 
 org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102)
   at 
 org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
   at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
   

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-09-23 Thread Luis Filipe Nassif (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145244#comment-14145244
 ] 

Luis Filipe Nassif commented on TIKA-1422:
--

Also, despite the tests, currently trunk will afect users that look for image 
metadata only, because now TesseractOCRParser is the deafult image parser and 
it does not extract metadata. That approach will resolve this problem too. 
Another approach is to remove TesseractParser from the default service provider 
parser list and to call it from the existing image parsers if a 
TesseractOCRConfig object is found into the parseContext object. It is strange 
to see a TesseractOCRParser in the trace if the user only wants image metadata.

 org.apache.tika.parser.mail.RFC822ParserTest fails
 --

 Key: TIKA-1422
 URL: https://issues.apache.org/jira/browse/TIKA-1422
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Chris A. Mattmann
 Fix For: 1.7


 I'm seeing test failures from:
 {noformat}
 Results :
 Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
 (..)
 Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
 {noformat}
 CentOS6 VM image, running:
 {noformat}
 [mattmann@memex tika]$ java -version
 java version 1.7.0_67
 Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
 [mattmann@memex tika]$ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T09:37:52-08:00)
 Maven home: /usr/share/apache-maven
 Java version: 1.7.0_65, vendor: Oracle Corporation
 Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: 
 amd64, family: unix
 [mattmann@memex tika]$ 
 {noformat}
 Here are the surefire reports - no clue what's up here:
 {noformat}
 [mattmann@memex tika]$ more 
 tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
  
 ---
 Test set: org.apache.tika.parser.mail.RFC822ParserTest
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec  
 FAILURE!
 testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
 0.152 sec   FAILURE!
 org.mockito.exceptions.verification.TooManyActualInvocations: 
 xHTMLContentHandler.startElement(
 http://www.w3.org/1999/xhtml;,
 div,
 div,
 isA(org.xml.sax.Attributes)
 );
 Wanted 4 times but was 5
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
 Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
 Undesired invocation:
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   at 
 org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102)
   at 
 org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
   at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-09-23 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145253#comment-14145253
 ] 

Chris A. Mattmann commented on TIKA-1422:
-

I prefer to keep it as an available parser, so I'll have it delegate and call 
the image parsers to grab the metadata as well. THanks for the suggestion Luis!

 org.apache.tika.parser.mail.RFC822ParserTest fails
 --

 Key: TIKA-1422
 URL: https://issues.apache.org/jira/browse/TIKA-1422
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Chris A. Mattmann
 Fix For: 1.7


 I'm seeing test failures from:
 {noformat}
 Results :
 Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
 (..)
 Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
 {noformat}
 CentOS6 VM image, running:
 {noformat}
 [mattmann@memex tika]$ java -version
 java version 1.7.0_67
 Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
 [mattmann@memex tika]$ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T09:37:52-08:00)
 Maven home: /usr/share/apache-maven
 Java version: 1.7.0_65, vendor: Oracle Corporation
 Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: 
 amd64, family: unix
 [mattmann@memex tika]$ 
 {noformat}
 Here are the surefire reports - no clue what's up here:
 {noformat}
 [mattmann@memex tika]$ more 
 tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
  
 ---
 Test set: org.apache.tika.parser.mail.RFC822ParserTest
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec  
 FAILURE!
 testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
 0.152 sec   FAILURE!
 org.mockito.exceptions.verification.TooManyActualInvocations: 
 xHTMLContentHandler.startElement(
 http://www.w3.org/1999/xhtml;,
 div,
 div,
 isA(org.xml.sax.Attributes)
 );
 Wanted 4 times but was 5
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
 Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
 Undesired invocation:
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   at 
 org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102)
   at 
 org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
   at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   at 

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-09-23 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145299#comment-14145299
 ] 

Tyler Palsulich commented on TIKA-1422:
---

This assumes that if a user has Tesseract installed they want to run it on the 
image. Which, may not be true. But, I suppose they could call ImageParser 
directly if they want the performance boost.

+1 to delegating to ImageParser to parse the image metadata from 
TesseractOCRParser.

 org.apache.tika.parser.mail.RFC822ParserTest fails
 --

 Key: TIKA-1422
 URL: https://issues.apache.org/jira/browse/TIKA-1422
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Chris A. Mattmann
 Fix For: 1.7


 I'm seeing test failures from:
 {noformat}
 Results :
 Failed tests:   testMultipart(org.apache.tika.parser.mail.RFC822ParserTest): 
 (..)
 Tests run: 538, Failures: 1, Errors: 0, Skipped: 1
 {noformat}
 CentOS6 VM image, running:
 {noformat}
 [mattmann@memex tika]$ java -version
 java version 1.7.0_67
 Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
 [mattmann@memex tika]$ mvn -version
 Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
 2014-02-14T09:37:52-08:00)
 Maven home: /usr/share/apache-maven
 Java version: 1.7.0_65, vendor: Oracle Corporation
 Java home: /data/home/mattmann/dist/jdk1.7.0_65/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-431.23.3.el6.centos.plus.x86_64, arch: 
 amd64, family: unix
 [mattmann@memex tika]$ 
 {noformat}
 Here are the surefire reports - no clue what's up here:
 {noformat}
 [mattmann@memex tika]$ more 
 tika-parsers/target/surefire-reports/org.apache.tika.parser.mail.RFC822ParserTest.txt
  
 ---
 Test set: org.apache.tika.parser.mail.RFC822ParserTest
 ---
 Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.699 sec  
 FAILURE!
 testMultipart(org.apache.tika.parser.mail.RFC822ParserTest)  Time elapsed: 
 0.152 sec   FAILURE!
 org.mockito.exceptions.verification.TooManyActualInvocations: 
 xHTMLContentHandler.startElement(
 http://www.w3.org/1999/xhtml;,
 div,
 div,
 isA(org.xml.sax.Attributes)
 );
 Wanted 4 times but was 5
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:87)
 Caused by: org.mockito.exceptions.cause.UndesiredInvocation: 
 Undesired invocation:
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.xpath.MatchingContentHandler.startElement(MatchingContentHandler.java:60)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
   at 
 org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:254)
   at 
 org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:284)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.extractOutput(TesseractOCRParser.java:243)
   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:155)
   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247)
   at 
 org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:102)
   at 
 org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
   at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
   at 
 org.apache.tika.parser.mail.RFC822ParserTest.testMultipart(RFC822ParserTest.java:84)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at