[ https://issues.apache.org/jira/browse/PDFBOX-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17275604#comment-17275604 ]
ASF subversion and git services commented on PDFBOX-5090: --------------------------------------------------------- Commit 1886054 from le...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1886054 ] PDFBOX-5090: test strict mode with overflow detection > Missing text extraction under certain conditions starting with apache pdfbox > 2.0.18 > ----------------------------------------------------------------------------------- > > Key: PDFBOX-5090 > URL: https://issues.apache.org/jira/browse/PDFBOX-5090 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 2.0.18, 2.0.19, 2.0.20, 2.0.21, 2.0.22 > Environment: jdk 1.8, apache pdfbox, fontbox 2.0.18~, windows 10 > Reporter: sungwon kim > Priority: Major > Labels: regression > Fix For: 2.0.23, 3.0.0 PDFBox > > Attachments: 128채널심장전기도시스템을위한3차원매핑소프트웨어개발.pdf, > 128채널심장전기도시스템을위한3차원매핑소프트웨어개발.txt, > 128채널심장전기도시스템을위한3차원매핑소프트웨어개발_2p_left_botton.PNG, PDFBOX-5090_reduced.pdf, > textstripper_2.0.17_128채널심장전기도시스템을위한3차원매핑소프트웨어개발_2p_left_botton.PNG, > textstripper_2.0.17_独立財政機関をめぐる論点整理_3p_top.PNG, > textstripper_2.0.18_128채널심장전기도시스템을위한3차원매핑소프트웨어개발_2p_left_botton.PNG, > textstripper_2.0.18_独立財政機関をめぐる論点整理_3p_top.PNG, 独立財政機関をめぐる論点整理.pdf, > 独立財政機関をめぐる論点整理_3p_top.PNG > > > When calling PDFTextStripper.getText() function on pdfbox 2.0.18 or later, it > fails to extract text with any condition. > It is suspected that the missing text extraction phenomenon is associated > with either the font type or the font size or text's width and height. > I have attached the text extraction results of version 2.0.17 and version > 2.0.18 and the sample data used for the test. > code > > {code:java} > PDDocument pdDocument = PDDocument.load(new File(path)); > PDFTextStripper stripper = new PDFTextStripper(); > {code} > dependencies > > {code:java} > <properties> > <apache.pdfbox.version>2.0.18</apache.pdfbox.version> > </properties> > <dependencies> > <dependency> > <groupId>org.apache.pdfbox</groupId> > <artifactId>pdfbox</artifactId> > <version>${apache.pdfbox.version}</version> > </dependency> > <dependency> > <groupId>org.apache.pdfbox</groupId> > <artifactId>fontbox</artifactId> > <version>${apache.pdfbox.version}</version> > </dependency> > <dependency> > <groupId>org.apache.pdfbox</groupId> > <artifactId>xmpbox</artifactId> > <version>${apache.pdfbox.version}</version> > </dependency> > </dependencies> > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org