[
https://issues.apache.org/jira/browse/PDFBOX-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Soocheon Kim reopened PDFBOX-4275:
----------------------------------
(I'm sorry that I couldn't do anything for a few days)
Pdfbox can't extract the diagonal texts. I had a test today, too.
1. The content of my pdf file is like this:
!image-2018-07-31-23-38-32-829.png!
2. My test program is as follow..
PDDocument doc = null;
try {
doc = PDDocument.load(file);
PDFTextStripper parser = new PDFTextStripper();
String text = parser.getText(doc);
System.out.println(text);
} finally {
if (doc != null)
doc.close();
}
3. The results are as follow..
1111
5
5
5
5
7777
9
9
9
9
PDFTextStripper extracts only texts rotated 90, 180, 270 degrees.
PDFStreamEngine.showGlyph(...) does also.
> Can't extract slanted text through the parsers of the PDFBox
> ------------------------------------------------------------
>
> Key: PDFBOX-4275
> URL: https://issues.apache.org/jira/browse/PDFBOX-4275
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, Text extraction
> Affects Versions: 2.0.10
> Environment: I tested that in the overried showGlyph() method of my
> class extending PDFStreamEngine, PDFGraphicsStreamEngine or PDFTextStripper.
> Reporter: Soocheon Kim
> Priority: Major
>
> The PDFBox (StreamEngine) extracts only texts that are rotated by 0, 90, 180
> or -90 degrees.
> For example, it can't extract texts rotated by 45 or 60 degrees.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]