[ 
https://issues.apache.org/jira/browse/PDFBOX-5822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847504#comment-17847504
 ] 

Andreas Lehmkühler commented on PDFBOX-5822:
--------------------------------------------

The current page number of the TextStripper is 1-based. IMHO the 
initialization/increment of the value is suboptimal. Setting {{currentPageNo}} 
to 1 in {{resetEngine}} and incrementing in after the current page is processed 
should do the trick.

> IllegalArgumentException: Parameter must be 1-based, but is 0 when using 
> PDFTextStripperByArea
> ----------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-5822
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5822
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.32, 4.0.0, 3.0.3 PDFBox
>            Reporter: Tilman Hausherr
>            Priority: Major
>              Labels: regression
>             Fix For: 2.0.32, 4.0.0, 3.0.3 PDFBox
>
>
> As reported by Pascal Schumacher in the users mailing list 
> https://lists.apache.org/thread/yb42j9s5vp8jsjog9msplbc05y1xqwv3
> java.lang.IllegalArgumentException: Parameter must be 1-based, but is 0
>       at 
> org.apache.pdfbox.text.PDFTextStripper.setStartPage(PDFTextStripper.java:956)
>       at 
> org.apache.pdfbox.text.PDFTextStripperByArea.extractRegions(PDFTextStripperByArea.java:117)
> this is because of this earlier seemingly "harmless" commit
> https://github.com/apache/pdfbox/commit/5c0abf94367c12c9ac0b464046784d456ce4caf5
> that broke PDFTextStripperByArea because it has two calls with 0 parameter.
> This wasn't discovered because we have no tests for PDFTextStripperByArea 😬



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to