[jira] [Comment Edited] (PDFBOX-5365) (定位不对)location is bad
[ https://issues.apache.org/jira/browse/PDFBOX-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476999#comment-17476999 ] Tilman Hausherr edited comment on PDFBOX-5365 at 1/17/22, 7:30 AM: --- Likely a problem with the cropbox, at least in PDFDebugger. The other two boxes (cyan and blue) are at the correct position. I haven't tested PrintTextLocations yet. !screenshot-1.png! was (Author: tilman): Likely a problem with the cropbox, at least in PDFDebugger. The other two are at the correct position. I didn't test PrintTextLocations yet. !screenshot-1.png! > (定位不对)location is bad > - > > Key: PDFBOX-5365 > URL: https://issues.apache.org/jira/browse/PDFBOX-5365 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.25 >Reporter: dv >Priority: Major > Attachments: XyWordPDFTextStripper.java, XyWordTest.java, > screenshot-1.png, test.pdf, test_wordFraming.pdf > > > 文字定位不对,详细见附件 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-5365) (定位不对)location is bad
[ https://issues.apache.org/jira/browse/PDFBOX-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476999#comment-17476999 ] Tilman Hausherr edited comment on PDFBOX-5365 at 1/17/22, 7:27 AM: --- Likely a problem with the cropbox, at least in PDFDebugger. The other two are at the correct position. I didn't test PrintTextLocations yet. !screenshot-1.png! was (Author: tilman): Likely a problem with the cropbox, at least in PDFDebugger. I didn't test PrintTextLocations yet. !screenshot-1.png! > (定位不对)location is bad > - > > Key: PDFBOX-5365 > URL: https://issues.apache.org/jira/browse/PDFBOX-5365 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.25 >Reporter: dv >Priority: Major > Attachments: XyWordPDFTextStripper.java, XyWordTest.java, > screenshot-1.png, test.pdf, test_wordFraming.pdf > > > 文字定位不对,详细见附件 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5365) (定位不对)location is bad
[ https://issues.apache.org/jira/browse/PDFBOX-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476999#comment-17476999 ] Tilman Hausherr commented on PDFBOX-5365: - Likely a problem with the cropbox, at least in PDFDebugger. I didn't test PrintTextLocations yet. !screenshot-1.png! > (定位不对)location is bad > - > > Key: PDFBOX-5365 > URL: https://issues.apache.org/jira/browse/PDFBOX-5365 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.25 >Reporter: dv >Priority: Major > Attachments: XyWordPDFTextStripper.java, XyWordTest.java, > screenshot-1.png, test.pdf, test_wordFraming.pdf > > > 文字定位不对,详细见附件 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5365) (定位不对)location is bad
[ https://issues.apache.org/jira/browse/PDFBOX-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5365: Attachment: screenshot-1.png > (定位不对)location is bad > - > > Key: PDFBOX-5365 > URL: https://issues.apache.org/jira/browse/PDFBOX-5365 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.25 >Reporter: dv >Priority: Major > Attachments: XyWordPDFTextStripper.java, XyWordTest.java, > screenshot-1.png, test.pdf, test_wordFraming.pdf > > > 文字定位不对,详细见附件 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-5364) Picture position accuracy problem
[ https://issues.apache.org/jira/browse/PDFBOX-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476997#comment-17476997 ] Tilman Hausherr edited comment on PDFBOX-5364 at 1/17/22, 7:24 AM: --- likely a floating point math problem :-( The rendering is less bad at a higher dpi, but the problem is still there. was (Author: tilman): likely a floating point math problem :-( > Picture position accuracy problem > - > > Key: PDFBOX-5364 > URL: https://issues.apache.org/jira/browse/PDFBOX-5364 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.25, 3.0.0 PDFBox > Environment: windows, linux >Reporter: zhaoyuanli >Priority: Major > Attachments: image-2022-01-17-11-48-01-920.png, test1.pdf > > > * The image displayed is actually a combination of many images with a height > of 1 pixel > * However, after processing in pdfBox, some pictures seem to be missing. > The display in Adobe is correct > * Please see attached "test1.pdf" for the document. thanks~~ > > !image-2022-01-17-11-48-01-920.png! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5364) Picture position accuracy problem
[ https://issues.apache.org/jira/browse/PDFBOX-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476997#comment-17476997 ] Tilman Hausherr commented on PDFBOX-5364: - likely a floating point math problem :-( > Picture position accuracy problem > - > > Key: PDFBOX-5364 > URL: https://issues.apache.org/jira/browse/PDFBOX-5364 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.25, 3.0.0 PDFBox > Environment: windows, linux >Reporter: zhaoyuanli >Priority: Major > Attachments: image-2022-01-17-11-48-01-920.png, test1.pdf > > > * The image displayed is actually a combination of many images with a height > of 1 pixel > * However, after processing in pdfBox, some pictures seem to be missing. > The display in Adobe is correct > * Please see attached "test1.pdf" for the document. thanks~~ > > !image-2022-01-17-11-48-01-920.png! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5364) Picture position accuracy problem
[ https://issues.apache.org/jira/browse/PDFBOX-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5364: Priority: Major (was: Critical) > Picture position accuracy problem > - > > Key: PDFBOX-5364 > URL: https://issues.apache.org/jira/browse/PDFBOX-5364 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.25, 3.0.0 PDFBox > Environment: windows, linux >Reporter: zhaoyuanli >Priority: Major > Attachments: image-2022-01-17-11-48-01-920.png, test1.pdf > > > * The image displayed is actually a combination of many images with a height > of 1 pixel > * However, after processing in pdfBox, some pictures seem to be missing. > The display in Adobe is correct > * Please see attached "test1.pdf" for the document. thanks~~ > > !image-2022-01-17-11-48-01-920.png! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5364) Picture position accuracy problem
[ https://issues.apache.org/jira/browse/PDFBOX-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5364: Affects Version/s: 2.0.25 3.0.0 PDFBox (was: 3.0.0 JBIG2) (was: 3.0.1 JBIG2) (was: 3.0.2 JBIG2) (was: 3.0.3 JBIG2) > Picture position accuracy problem > - > > Key: PDFBOX-5364 > URL: https://issues.apache.org/jira/browse/PDFBOX-5364 > Project: PDFBox > Issue Type: Bug > Components: Parsing, Rendering >Affects Versions: 2.0.25, 3.0.0 PDFBox > Environment: windows, linux >Reporter: zhaoyuanli >Priority: Critical > Attachments: image-2022-01-17-11-48-01-920.png, test1.pdf > > > * The image displayed is actually a combination of many images with a height > of 1 pixel > * However, after processing in pdfBox, some pictures seem to be missing. > The display in Adobe is correct > * Please see attached "test1.pdf" for the document. thanks~~ > > !image-2022-01-17-11-48-01-920.png! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5364) Picture position accuracy problem
[ https://issues.apache.org/jira/browse/PDFBOX-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5364: Component/s: (was: Parsing) > Picture position accuracy problem > - > > Key: PDFBOX-5364 > URL: https://issues.apache.org/jira/browse/PDFBOX-5364 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.25, 3.0.0 PDFBox > Environment: windows, linux >Reporter: zhaoyuanli >Priority: Critical > Attachments: image-2022-01-17-11-48-01-920.png, test1.pdf > > > * The image displayed is actually a combination of many images with a height > of 1 pixel > * However, after processing in pdfBox, some pictures seem to be missing. > The display in Adobe is correct > * Please see attached "test1.pdf" for the document. thanks~~ > > !image-2022-01-17-11-48-01-920.png! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5365) (定位不对)location is bad
[ https://issues.apache.org/jira/browse/PDFBOX-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dv updated PDFBOX-5365: --- Description: 文字定位不对,详细见附件 (was: 文字定位不对) > (定位不对)location is bad > - > > Key: PDFBOX-5365 > URL: https://issues.apache.org/jira/browse/PDFBOX-5365 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.25 >Reporter: dv >Priority: Major > Attachments: XyWordPDFTextStripper.java, XyWordTest.java, test.pdf, > test_wordFraming.pdf > > > 文字定位不对,详细见附件 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5365) (定位不对)location is bad
[ https://issues.apache.org/jira/browse/PDFBOX-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dv updated PDFBOX-5365: --- Affects Version/s: 2.0.25 > (定位不对)location is bad > - > > Key: PDFBOX-5365 > URL: https://issues.apache.org/jira/browse/PDFBOX-5365 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.25 >Reporter: dv >Priority: Major > Attachments: XyWordPDFTextStripper.java, XyWordTest.java, test.pdf, > test_wordFraming.pdf > > > 文字定位不对 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5365) (定位不对)location is bad
[ https://issues.apache.org/jira/browse/PDFBOX-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dv updated PDFBOX-5365: --- Description: 文字定位不对 > (定位不对)location is bad > - > > Key: PDFBOX-5365 > URL: https://issues.apache.org/jira/browse/PDFBOX-5365 > Project: PDFBox > Issue Type: Bug >Reporter: dv >Priority: Major > Attachments: XyWordPDFTextStripper.java, XyWordTest.java, test.pdf, > test_wordFraming.pdf > > > 文字定位不对 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-5365) (定位不对)location is bad
dv created PDFBOX-5365: -- Summary: (定位不对)location is bad Key: PDFBOX-5365 URL: https://issues.apache.org/jira/browse/PDFBOX-5365 Project: PDFBox Issue Type: Bug Reporter: dv Attachments: XyWordPDFTextStripper.java, XyWordTest.java, test.pdf, test_wordFraming.pdf -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-5364) Picture position accuracy problem
zhaoyuanli created PDFBOX-5364: -- Summary: Picture position accuracy problem Key: PDFBOX-5364 URL: https://issues.apache.org/jira/browse/PDFBOX-5364 Project: PDFBox Issue Type: Bug Components: Parsing, Rendering Affects Versions: 3.0.3 JBIG2, 3.0.2 JBIG2, 3.0.1 JBIG2, 3.0.0 JBIG2 Environment: windows, linux Reporter: zhaoyuanli Attachments: image-2022-01-17-11-48-01-920.png, test1.pdf * The image displayed is actually a combination of many images with a height of 1 pixel * However, after processing in pdfBox, some pictures seem to be missing. The display in Adobe is correct * Please see attached "test1.pdf" for the document. thanks~~ !image-2022-01-17-11-48-01-920.png! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[GitHub] [pdfbox] famod opened a new pull request #140: PDFBOX-5363: Don't log warnings if there are not fonts to cache
famod opened a new pull request #140: URL: https://github.com/apache/pdfbox/pull/140 I kept `saveDiskCache()` out of the `if` block so that the cache file reflects reality (in case it was there already with whichever content). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-5363) Don't log warnings if there are not fonts to cache
Falko Modler created PDFBOX-5363: Summary: Don't log warnings if there are not fonts to cache Key: PDFBOX-5363 URL: https://issues.apache.org/jira/browse/PDFBOX-5363 Project: PDFBox Issue Type: Improvement Affects Versions: 2.0.25 Reporter: Falko Modler I'm seeing these warnings: {noformat} 2022-01-16 16:49:05,723 WARN [org.apa.pdf.pdm.fon.FileSystemFontProvider] (executor-thread-10) Building on-disk font cache, this may take a while 2022-01-16 16:49:05,724 WARN [org.apa.pdf.pdm.fon.FileSystemFontProvider] (executor-thread-10) Finished building on-disk font cache, found 0 fonts {noformat} which are pretty much useless because creating a basically empty cache will *not* take a while. These warnings should only be logged if there actually are font files. PS: My case is a Quarkus app running as a docker container based on [azul/zulu-openjdk-alpine:17.0.1-jre-headless|https://hub.docker.com/layers/azul/zulu-openjdk-alpine/17.0.1-jre-headless/images/sha256-e248739ad4723f184b62616677979c32beeb36f33f71139dc1d0ff43add04e89?context=explore]. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4892) Improve code quality (4)
[ https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476816#comment-17476816 ] ASF subversion and git services commented on PDFBOX-4892: - Commit 1897128 from Tilman Hausherr in branch 'pdfbox/trunk' [ https://svn.apache.org/r1897128 ] PDFBOX-4892: LGTM fix > Improve code quality (4) > > > Key: PDFBOX-4892 > URL: https://issues.apache.org/jira/browse/PDFBOX-4892 > Project: PDFBox > Issue Type: Improvement >Affects Versions: 2.0.20 >Reporter: Tilman Hausherr >Priority: Minor > > This is a longterm issue for the task to improve code quality, by using the > [SonarQube report|https://sonarcloud.io/project/issues?id=pdfbox-reactor], > hints in different IDEs, the FindBugs tool and other code quality tools. > This is a follow-up of PDFBOX-4071, which was getting too long. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4892) Improve code quality (4)
[ https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476815#comment-17476815 ] ASF subversion and git services commented on PDFBOX-4892: - Commit 1897127 from Tilman Hausherr in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1897127 ] PDFBOX-4892: LGTM fix > Improve code quality (4) > > > Key: PDFBOX-4892 > URL: https://issues.apache.org/jira/browse/PDFBOX-4892 > Project: PDFBox > Issue Type: Improvement >Affects Versions: 2.0.20 >Reporter: Tilman Hausherr >Priority: Minor > > This is a longterm issue for the task to improve code quality, by using the > [SonarQube report|https://sonarcloud.io/project/issues?id=pdfbox-reactor], > hints in different IDEs, the FindBugs tool and other code quality tools. > This is a follow-up of PDFBOX-4071, which was getting too long. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Resolved] (PDFBOX-4623) COSParser: Infinite recursion
[ https://issues.apache.org/jira/browse/PDFBOX-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler resolved PDFBOX-4623. Resolution: Fixed > COSParser: Infinite recursion > - > > Key: PDFBOX-4623 > URL: https://issues.apache.org/jira/browse/PDFBOX-4623 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.25, 3.0.0 PDFBox > Environment: java version "12" 2019-03-19 > Java(TM) SE Runtime Environment (build 12+33) > Java HotSpot(TM) 64-Bit Server VM (build 12+33, mixed mode, sharing) > MacOS Mojave >Reporter: Alex Rebert >Assignee: Andreas Lehmkühler >Priority: Minor > Fix For: 2.0.26, 3.0.0 PDFBox > > Attachments: infinite-recursion.pdf, loop_in_page_tree.pdf, > poppler-43279-0.pdf, poppler-91414-1.zip-2.gz-53.pdf > > > Parsing an invalid PDF can lead to an infinite recursion in COSParser, which > results in a StackOverflowError. > *Steps to repro* > # Download malformed PDF (attached) > # {{Run: java -jar pdfbox-app-2.0.16.jar ExtractText infinite-recursion.pdf}} > *Stacktrace* > {noformat} > Exception in thread "main" java.lang.StackOverflowError [1005/1916] > at java.base/sun.nio.cs.UTF_8.updatePositions(UTF_8.java:79) > at java.base/sun.nio.cs.UTF_8$Decoder.xflow(UTF_8.java:210) > at java.base/sun.nio.cs.UTF_8$Decoder.decodeArrayLoop(UTF_8.java:321) > at java.base/sun.nio.cs.UTF_8$Decoder.decodeLoop(UTF_8.java:414) > at java.base/java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:578) > at java.base/java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:801) > at org.apache.pdfbox.pdfparser.BaseParser.isValidUTF8(BaseParser.java:787) > at org.apache.pdfbox.pdfparser.BaseParser.parseCOSName(BaseParser.java:768) > at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:887) > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:154) > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:283) > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:216) > at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:867) > at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:912) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:881) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:801) > at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:1055) > at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1114) > at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:920) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:881) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:801) > at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:1055) > at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1114) > at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:920) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:881) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:801) > at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:1055) > at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1114) > at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:920) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:881) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:801) > at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:1055) > at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1114) > ... > {noformat} > The file was generated by fuzzing and is (probably) not a valid PDF file. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4623) COSParser: Infinite recursion
[ https://issues.apache.org/jira/browse/PDFBOX-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476774#comment-17476774 ] ASF subversion and git services commented on PDFBOX-4623: - Commit 1897120 from le...@apache.org in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1897120 ] PDFBOX-4623: mark indirect objects if dereferencing them is in progress to detect a possible recursion and to avoid a stack overflow > COSParser: Infinite recursion > - > > Key: PDFBOX-4623 > URL: https://issues.apache.org/jira/browse/PDFBOX-4623 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.25, 3.0.0 PDFBox > Environment: java version "12" 2019-03-19 > Java(TM) SE Runtime Environment (build 12+33) > Java HotSpot(TM) 64-Bit Server VM (build 12+33, mixed mode, sharing) > MacOS Mojave >Reporter: Alex Rebert >Assignee: Andreas Lehmkühler >Priority: Minor > Fix For: 2.0.26, 3.0.0 PDFBox > > Attachments: infinite-recursion.pdf, loop_in_page_tree.pdf, > poppler-43279-0.pdf, poppler-91414-1.zip-2.gz-53.pdf > > > Parsing an invalid PDF can lead to an infinite recursion in COSParser, which > results in a StackOverflowError. > *Steps to repro* > # Download malformed PDF (attached) > # {{Run: java -jar pdfbox-app-2.0.16.jar ExtractText infinite-recursion.pdf}} > *Stacktrace* > {noformat} > Exception in thread "main" java.lang.StackOverflowError [1005/1916] > at java.base/sun.nio.cs.UTF_8.updatePositions(UTF_8.java:79) > at java.base/sun.nio.cs.UTF_8$Decoder.xflow(UTF_8.java:210) > at java.base/sun.nio.cs.UTF_8$Decoder.decodeArrayLoop(UTF_8.java:321) > at java.base/sun.nio.cs.UTF_8$Decoder.decodeLoop(UTF_8.java:414) > at java.base/java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:578) > at java.base/java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:801) > at org.apache.pdfbox.pdfparser.BaseParser.isValidUTF8(BaseParser.java:787) > at org.apache.pdfbox.pdfparser.BaseParser.parseCOSName(BaseParser.java:768) > at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:887) > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:154) > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:283) > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:216) > at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:867) > at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:912) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:881) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:801) > at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:1055) > at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1114) > at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:920) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:881) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:801) > at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:1055) > at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1114) > at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:920) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:881) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:801) > at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:1055) > at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1114) > at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:920) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:881) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:801) > at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:1055) > at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1114) > ... > {noformat} > The file was generated by fuzzing and is (probably) not a valid PDF file. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-4623) COSParser: Infinite recursion
[ https://issues.apache.org/jira/browse/PDFBOX-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-4623: --- Affects Version/s: 2.0.25 3.0.0 PDFBox (was: 2.0.16) > COSParser: Infinite recursion > - > > Key: PDFBOX-4623 > URL: https://issues.apache.org/jira/browse/PDFBOX-4623 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.25, 3.0.0 PDFBox > Environment: java version "12" 2019-03-19 > Java(TM) SE Runtime Environment (build 12+33) > Java HotSpot(TM) 64-Bit Server VM (build 12+33, mixed mode, sharing) > MacOS Mojave >Reporter: Alex Rebert >Assignee: Andreas Lehmkühler >Priority: Minor > Fix For: 2.0.26, 3.0.0 PDFBox > > Attachments: infinite-recursion.pdf, loop_in_page_tree.pdf, > poppler-43279-0.pdf, poppler-91414-1.zip-2.gz-53.pdf > > > Parsing an invalid PDF can lead to an infinite recursion in COSParser, which > results in a StackOverflowError. > *Steps to repro* > # Download malformed PDF (attached) > # {{Run: java -jar pdfbox-app-2.0.16.jar ExtractText infinite-recursion.pdf}} > *Stacktrace* > {noformat} > Exception in thread "main" java.lang.StackOverflowError [1005/1916] > at java.base/sun.nio.cs.UTF_8.updatePositions(UTF_8.java:79) > at java.base/sun.nio.cs.UTF_8$Decoder.xflow(UTF_8.java:210) > at java.base/sun.nio.cs.UTF_8$Decoder.decodeArrayLoop(UTF_8.java:321) > at java.base/sun.nio.cs.UTF_8$Decoder.decodeLoop(UTF_8.java:414) > at java.base/java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:578) > at java.base/java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:801) > at org.apache.pdfbox.pdfparser.BaseParser.isValidUTF8(BaseParser.java:787) > at org.apache.pdfbox.pdfparser.BaseParser.parseCOSName(BaseParser.java:768) > at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:887) > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:154) > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:283) > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:216) > at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:867) > at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:912) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:881) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:801) > at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:1055) > at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1114) > at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:920) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:881) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:801) > at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:1055) > at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1114) > at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:920) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:881) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:801) > at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:1055) > at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1114) > at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:920) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:881) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:801) > at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:1055) > at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1114) > ... > {noformat} > The file was generated by fuzzing and is (probably) not a valid PDF file. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Reopened] (PDFBOX-4623) COSParser: Infinite recursion
[ https://issues.apache.org/jira/browse/PDFBOX-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler reopened PDFBOX-4623: We got another pdf triggering the very same exception in 2.0.x from Jin Wang through the security mailing list. I had another look at the code and I proofed myself wrong. I've found a solution for 2.0.x. It avoids the stack overflow error and throws an IOException instead. This is true for [^infinite-recursion.pdf] > COSParser: Infinite recursion > - > > Key: PDFBOX-4623 > URL: https://issues.apache.org/jira/browse/PDFBOX-4623 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.16 > Environment: java version "12" 2019-03-19 > Java(TM) SE Runtime Environment (build 12+33) > Java HotSpot(TM) 64-Bit Server VM (build 12+33, mixed mode, sharing) > MacOS Mojave >Reporter: Alex Rebert >Assignee: Andreas Lehmkühler >Priority: Minor > Fix For: 3.0.0 PDFBox > > Attachments: infinite-recursion.pdf, loop_in_page_tree.pdf, > poppler-43279-0.pdf, poppler-91414-1.zip-2.gz-53.pdf > > > Parsing an invalid PDF can lead to an infinite recursion in COSParser, which > results in a StackOverflowError. > *Steps to repro* > # Download malformed PDF (attached) > # {{Run: java -jar pdfbox-app-2.0.16.jar ExtractText infinite-recursion.pdf}} > *Stacktrace* > {noformat} > Exception in thread "main" java.lang.StackOverflowError [1005/1916] > at java.base/sun.nio.cs.UTF_8.updatePositions(UTF_8.java:79) > at java.base/sun.nio.cs.UTF_8$Decoder.xflow(UTF_8.java:210) > at java.base/sun.nio.cs.UTF_8$Decoder.decodeArrayLoop(UTF_8.java:321) > at java.base/sun.nio.cs.UTF_8$Decoder.decodeLoop(UTF_8.java:414) > at java.base/java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:578) > at java.base/java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:801) > at org.apache.pdfbox.pdfparser.BaseParser.isValidUTF8(BaseParser.java:787) > at org.apache.pdfbox.pdfparser.BaseParser.parseCOSName(BaseParser.java:768) > at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:887) > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:154) > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:283) > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:216) > at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:867) > at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:912) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:881) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:801) > at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:1055) > at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1114) > at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:920) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:881) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:801) > at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:1055) > at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1114) > at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:920) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:881) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:801) > at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:1055) > at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1114) > at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:920) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:881) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:801) > at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:1055) > at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1114) > ... > {noformat} > The file was generated by fuzzing and is (probably) not a valid PDF file. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-4623) COSParser: Infinite recursion
[ https://issues.apache.org/jira/browse/PDFBOX-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-4623: --- Fix Version/s: 2.0.26 > COSParser: Infinite recursion > - > > Key: PDFBOX-4623 > URL: https://issues.apache.org/jira/browse/PDFBOX-4623 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.16 > Environment: java version "12" 2019-03-19 > Java(TM) SE Runtime Environment (build 12+33) > Java HotSpot(TM) 64-Bit Server VM (build 12+33, mixed mode, sharing) > MacOS Mojave >Reporter: Alex Rebert >Assignee: Andreas Lehmkühler >Priority: Minor > Fix For: 2.0.26, 3.0.0 PDFBox > > Attachments: infinite-recursion.pdf, loop_in_page_tree.pdf, > poppler-43279-0.pdf, poppler-91414-1.zip-2.gz-53.pdf > > > Parsing an invalid PDF can lead to an infinite recursion in COSParser, which > results in a StackOverflowError. > *Steps to repro* > # Download malformed PDF (attached) > # {{Run: java -jar pdfbox-app-2.0.16.jar ExtractText infinite-recursion.pdf}} > *Stacktrace* > {noformat} > Exception in thread "main" java.lang.StackOverflowError [1005/1916] > at java.base/sun.nio.cs.UTF_8.updatePositions(UTF_8.java:79) > at java.base/sun.nio.cs.UTF_8$Decoder.xflow(UTF_8.java:210) > at java.base/sun.nio.cs.UTF_8$Decoder.decodeArrayLoop(UTF_8.java:321) > at java.base/sun.nio.cs.UTF_8$Decoder.decodeLoop(UTF_8.java:414) > at java.base/java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:578) > at java.base/java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:801) > at org.apache.pdfbox.pdfparser.BaseParser.isValidUTF8(BaseParser.java:787) > at org.apache.pdfbox.pdfparser.BaseParser.parseCOSName(BaseParser.java:768) > at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:887) > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:154) > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:283) > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:216) > at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:867) > at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:912) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:881) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:801) > at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:1055) > at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1114) > at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:920) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:881) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:801) > at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:1055) > at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1114) > at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:920) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:881) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:801) > at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:1055) > at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1114) > at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:920) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:881) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:801) > at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:1055) > at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1114) > ... > {noformat} > The file was generated by fuzzing and is (probably) not a valid PDF file. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org