[ https://issues.apache.org/jira/browse/PDFBOX-5822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847504#comment-17847504 ]
Andreas Lehmkühler commented on PDFBOX-5822: -------------------------------------------- The current page number of the TextStripper is 1-based. IMHO the initialization/increment of the value is suboptimal. Setting {{currentPageNo}} to 1 in {{resetEngine}} and incrementing in after the current page is processed should do the trick. > IllegalArgumentException: Parameter must be 1-based, but is 0 when using > PDFTextStripperByArea > ---------------------------------------------------------------------------------------------- > > Key: PDFBOX-5822 > URL: https://issues.apache.org/jira/browse/PDFBOX-5822 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 2.0.32, 4.0.0, 3.0.3 PDFBox > Reporter: Tilman Hausherr > Priority: Major > Labels: regression > Fix For: 2.0.32, 4.0.0, 3.0.3 PDFBox > > > As reported by Pascal Schumacher in the users mailing list > https://lists.apache.org/thread/yb42j9s5vp8jsjog9msplbc05y1xqwv3 > java.lang.IllegalArgumentException: Parameter must be 1-based, but is 0 > at > org.apache.pdfbox.text.PDFTextStripper.setStartPage(PDFTextStripper.java:956) > at > org.apache.pdfbox.text.PDFTextStripperByArea.extractRegions(PDFTextStripperByArea.java:117) > this is because of this earlier seemingly "harmless" commit > https://github.com/apache/pdfbox/commit/5c0abf94367c12c9ac0b464046784d456ce4caf5 > that broke PDFTextStripperByArea because it has two calls with 0 parameter. > This wasn't discovered because we have no tests for PDFTextStripperByArea 😬 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org