[ 
https://issues.apache.org/jira/browse/TIKA-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14185726#comment-14185726
 ] 

Tadeu Alves commented on TIKA-1457:
-----------------------------------

Thanks again Tim for your help,

This an homologation environment, and that's why i'm testing like this

i want to see if this will fix all of my indexing problems till solr 5.0 comes 
out. I'm monitoring my Solr server to see if it will have memory leaks or CPU 
stress

But nothing wrong at the momment, tomorrow i'll post the final result.

> NullPointerException in tika-app, parsing PDF content
> -----------------------------------------------------
>
>                 Key: TIKA-1457
>                 URL: https://issues.apache.org/jira/browse/TIKA-1457
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.5
>         Environment: OS - Linux Centos 6.5
> Web APP - Tomcat6
> Using Solr 4.10
> Tika Jar
>           * tika-core-1.5.jar
>           * tika-parsers-1.5.jar
>           * tika-xmp-1.5.jar
>           * pdfbox-1.8.4.jar
>            Reporter: Tadeu Alves
>              Labels: bug, parser, solr, tika,text-extraction
>             Fix For: 1.6
>
>
> When I try to extract text from some pdf files with the tika app 1.5
> null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.pdf.PDFParser@52cfcf01
>       at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:225)
>       at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>       at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>       at 
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:246)
>       at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
>       at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>       at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>       at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>       at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>       at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>       at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>       at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>       at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
>       at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
>       at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
>       at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.tika.exception.TikaException: Unexpected 
> RuntimeException from org.apache.tika.parser.pdf.PDFParser@52cfcf01
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>       at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
>       ... 19 more
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
> range: 0
>       at java.lang.String.charAt(String.java:658)
>       at 
> org.apache.pdfbox.util.DateConverter.parseDate(DateConverter.java:680)
>       at 
> org.apache.pdfbox.util.DateConverter.toCalendar(DateConverter.java:808)
>       at 
> org.apache.pdfbox.util.DateConverter.toCalendar(DateConverter.java:780)
>       at 
> org.apache.pdfbox.util.DateConverter.toCalendar(DateConverter.java:754)
>       at org.apache.pdfbox.cos.COSDictionary.getDate(COSDictionary.java:797)
>       at 
> org.apache.pdfbox.pdmodel.PDDocumentInformation.getModificationDate(PDDocumentInformation.java:232)
>       at 
> org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:176)
>       at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:142)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>       ... 22 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to