[
https://issues.apache.org/jira/browse/PDFBOX-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roman updated PDFBOX-3710:
--------------------------
Description:
After migration of our App from pdfbox 1.8 to 2.0, we noticed a regression: 4
lines of texts are disappeared. Those are the texts followed by black bullet (3
lines) and also "OVERALL" word which is placed above in table.
Problematic PDF attached - [^highlight19.pdf_page1.pdf]
Also, attached the result of
[DrawPrintTextLocations|https://apache.googlesource.com/pdfbox/+/trunk/examples/src/main/java/org/apache/pdfbox/examples/util/DrawPrintTextLocations.java]
example -
[highlight19.pdf_page1-marked-1.png|https://issues.apache.org/jira/secure/attachment/12856229/highlight19.pdf_page1-marked-1.png]
Notice, that unicodes, red and blue boxes missing for problematic text. The
main problem that these glyphs are absent in *textPositions* parameter which is
passed to *writeString* function, line #275. In the 1.8 version these
characters ARE present, so their positions along with their char codes could be
extracted fine in our App.
Also, attached picture of regression in our App - [^regression_in_blue.png].
Here, blue boxes drawn where text WAS present and disappeared afterwards. (The
purple boxes are OK and should be ignored.)
was:
After migration of our App from pdfbox 1.8 to 2.0, we noticed a regression: 4
lines of texts are disappeared. Those are the texts followed by black bullet (3
lines) and also "OVERALL" word which is placed above in table.
Problematic PDF attached - [^highlight19.pdf_page1.pdf]
Also, attached the result of
[DrawPrintTextLocations|https://apache.googlesource.com/pdfbox/+/trunk/examples/src/main/java/org/apache/pdfbox/examples/util/DrawPrintTextLocations.java]
example -
[highlight19.pdf_page1-marked-1.png|https://issues.apache.org/jira/secure/attachment/12856229/highlight19.pdf_page1-marked-1.png]
Notice, that unicodes, red and blue boxes missing for problematic text. The
main problem that these glyphs are absent in *textPositions* parameter which is
passed to *writeString* function, line #275. In the 1.8 version these
characters ARE present, so their positions along with their char codes could be
extracted fine in our App.
Also, attached picture of regression in our App - [^regression_in_blue.png].
Here, blue boxes drawn where text WAS present and disappeared afterwards.
> Text Stripper in 2.0 lost some texts - regression
> -------------------------------------------------
>
> Key: PDFBOX-3710
> URL: https://issues.apache.org/jira/browse/PDFBOX-3710
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Reporter: Roman
> Attachments: highlight19.pdf_page1-marked-1.png,
> highlight19.pdf_page1.pdf, regression_in_blue.png
>
>
> After migration of our App from pdfbox 1.8 to 2.0, we noticed a regression: 4
> lines of texts are disappeared. Those are the texts followed by black bullet
> (3 lines) and also "OVERALL" word which is placed above in table.
> Problematic PDF attached - [^highlight19.pdf_page1.pdf]
> Also, attached the result of
> [DrawPrintTextLocations|https://apache.googlesource.com/pdfbox/+/trunk/examples/src/main/java/org/apache/pdfbox/examples/util/DrawPrintTextLocations.java]
> example -
> [highlight19.pdf_page1-marked-1.png|https://issues.apache.org/jira/secure/attachment/12856229/highlight19.pdf_page1-marked-1.png]
> Notice, that unicodes, red and blue boxes missing for problematic text. The
> main problem that these glyphs are absent in *textPositions* parameter which
> is passed to *writeString* function, line #275. In the 1.8 version these
> characters ARE present, so their positions along with their char codes could
> be extracted fine in our App.
> Also, attached picture of regression in our App - [^regression_in_blue.png].
> Here, blue boxes drawn where text WAS present and disappeared afterwards.
> (The purple boxes are OK and should be ignored.)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]