Priya Kujur created TIKA-972:
--------------------------------
Summary: Unexpected RuntimeException from
org.apache.tika.parser.pdf.PDFParser .
Key: TIKA-972
URL: https://issues.apache.org/jira/browse/TIKA-972
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 0.9
Environment: Core java , Windows server 2003
Reporter: Priya Kujur
While extracting text from PDF , Tika throws runtime exception. The exception
is not thrown when java code is executed in windows 7 , but when it is executed
on Windows server 2003; it is found.
This is strange but my devlopment environment is windows 7 and production env
is Server2003. Java being platform independent, this issue is making me crazy.
Any kind of help is much appreciated.
Please check the stack trace:
java.io.IOException:
at org.apache.tika.parser.ParsingReader.read(ParsingReader.java:271)
at java.io.BufferedReader.fill(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
at com.servient.utilities.textmanipulation.ReaderUtil.readBuffer(ReaderU
til.java:39)
at com.servient.mapi.metadata.factory.TikaMetaDataExport.processFile(Tik
aMetaDataExport.java:255)
at com.servient.mapi.metadata.factory.BaseMetadataExport.process(BaseMet
adataExport.java:37)
at com.servient.mapi.wrapper.AttachmentWrapper.saveTextMetadataExtract(A
ttachmentWrapper.java:116)
at com.servient.mapi.wrapper.AttachmentWrapper.process(AttachmentWrapper
.java:40)
at com.servient.mapi.wrapper.AttachmentWrapper.<init>(AttachmentWrapper.
java:36)
at com.servient.mapi.wrapper.MessageWrapper.writeCatalog(MessageWrapper.
java:761)
at com.servient.mapi.wrapper.MessageWrapper.writeCatalog(MessageWrapper.
java:754)
at com.servient.mapi.wrapper.MessageWrapper.process(MessageWrapper.java:
804)
at com.servient.mapi.MAPI.main(MAPI.java:190)
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.pdf.PDFParser@ea0a39
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199
)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197
)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
35)
at org.apache.tika.parser.ParsingReader$ParsingTask.run(ParsingReader.ja
va:232)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.IllegalArgumentException: Comparison method violates its ge
neral contract!
at java.util.TimSort.mergeHi(Unknown Source)
at java.util.TimSort.mergeAt(Unknown Source)
at java.util.TimSort.mergeCollapse(Unknown Source)
at java.util.TimSort.sort(Unknown Source)
at java.util.TimSort.sort(Unknown Source)
at java.util.Arrays.sort(Unknown Source)
at java.util.Collections.sort(Unknown Source)
at org.apache.pdfbox.util.PDFTextStripper.writePage(PDFTextStripper.java
:551)
at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.ja
va:443)
at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.j
ava:366)
at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java
:322)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:89)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira