[ 
https://issues.apache.org/jira/browse/TIKA-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872309#comment-17872309
 ] 

Hudson commented on TIKA-4296:
------------------------------

SUCCESS: Integrated in Jenkins build Tika ยป tika-main-jdk11 #1736 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1736/])
TIKA-4296: use valid start page (tilman: 
[https://github.com/apache/tika/commit/7fd48d659a9b3dec583154dabd6971998b282d2b])
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java


> "Parameter must be 1-based, but is -1" when using Tika with PDFBox 2.0.32
> -------------------------------------------------------------------------
>
>                 Key: TIKA-4296
>                 URL: https://issues.apache.org/jira/browse/TIKA-4296
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 2.9.2
>            Reporter: Thomas Mortagne
>            Assignee: Tilman Hausherr
>            Priority: Major
>             Fix For: 3.0.0, 2.9.3
>
>         Attachments: pdf.pdf
>
>
> I just upgraded my pdfbox dependency to 2.0.32 and any Tika#parseToString of 
> a pdf file seems to produce the following warning:
> {noformat}
> WARN  o.apache.pdfbox.text.PDFTextStripper - Parameter must be 1-based, but 
> is -1
> {noformat}
> The behavior is the same as with 2.0.31, it's just that pdfbox is apparently 
> not too happy anymore with the way it's used by Tika.
> This new warning was apparently introduced by PDFBOX-5822.
> Just in case it's not actually any file, here is one with which I reproduce:  
> [^pdf.pdf] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to