[jira] [Updated] (PDFBOX-3931) Losing fonts (embedded subset) when merge documents with PDFMergerUtility

Nazar Dub (JIRA) Fri, 15 Sep 2017 06:21:22 -0700

     [ 
https://issues.apache.org/jira/browse/PDFBOX-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Nazar Dub updated PDFBOX-3931:
------------------------------
    Description: 
*Story:*
I want to merge two PdDocument with: 
{code:java}
PDFMergerUtility#appendDocument(PDDocument destination, PDDocument source)
{code}
Both documents created from scratch in java. I open _PDPageContentStream_ for 
each document, add some text and then close _PDPageContentStream_. For each 
document I used _PdFont_ which declared by next code:
{code:java}
PDFont getFont(PdDocument document) {
    InputStream fontStream = 
Thread.currentThread().getContextClassLoader().getResourceAsStream("font/Calibri.ttf");
    return PDType0Font.load(ctx.getDocument(), fontStream, true);
}
// Note that subset flag is true
{code}
Then I merge documents:
{code:java}
PDFMergerUtility.appendDocument(document1, document2);
{code}
Then  close *document2*: 
{code:java}
document2.close();
{code}
And save *document1* to _OutputStream_:
{code:java}
document1.save(someOutputStream);
{code}

*Expected results:*
I get pdf file with all fonts embedded as subset.

*Actual result:*
Font is embeded correctly only for pages created with *document1*, pages 
created with *document2* are present, but no embed font for them. 
As a result if I open created pdf file in OS which has Calibri.ttf I see 
correct font on all pages, if Calibri.ttf is not exist then font is correct 
only on pages created with *document1*.

*Used workaround:*
I see that _PdDocument_ has field:
{code:java}
// fonts to subset before saving
private final Set<PDFont> fontsToSubset = new HashSet<PDFont>();
{code}
fonts are added to this field when client call:
{code:java}
PDPageContentStream#setFont(PdFont font, float fontSize)
{code}
and actual embedding happens in method:
{code:java}
PdDocument#save(OutputStream output);
{code}
In my example above, method *save* is never called for *document2*.
We append *docuement2* to *document1* and *save* only *document1*. 

I reviewed method:
{code:java}
PDFMergerUtility#appendDocument(PDDocument destination, PDDocument source)
{code}
And I did not find that this method do something with *fontsToSubset* field.
So I create next method:    
{code:java}
@SuppressWarnings("unchecked")
private static void subsetFonts(final PDDocument document) {
    try {
        Field fontsToSubsetField = 
document.getClass().getDeclaredField("fontsToSubset");
        fontsToSubsetField.setAccessible(true);
        Set<PDFont> fontsToSubset = (Set<PDFont>) 
fontsToSubsetField.get(document);
        for (PDFont font : fontsToSubset) {
            font.subset();
        }
    } catch (NoSuchFieldException | IOException | IllegalAccessException | 
ClassCastException e) {
        LOGGER.warn("Error when subset embedded fonts into pdf document", e);
    }
}
{code}

And used it before merge documents:
{code:java}
subsetFonts(document2);
mergerUtility.appendDocument(document1, document2);
{code}
(I need to use some Reflection because *fontsToSubset* is private part of 
_PdDocument_)

I think other and maybe better option maybe:
{code:java}
docuement1.fontsToSubset.addAll(docuement2fontsToSubset);
{code}
But I did not tested this option. 

*Conclusion:*
I think this problem should be solved on library side in 
_PDFMergerUtility#appendDocument_ method, and not in client code. Or we should 
have javadoc which tells that we should use _PDFMergerUtility#appendDocument_ 
only for saved _PdDocument_

  was:
*Story:*
I want to merge two PdDocument with: 
{code:java}
PDFMergerUtility#appendDocument(PDDocument destination, PDDocument source)
{code}
Both documents created from scratch in java. I open _PDPageContentStream_ for 
each document, add some text and then close _PDPageContentStream_. For each 
document I used _PdFont_ which declared by next code:
{code:java}
PDFont getFont(PdDocument document) {
    InputStream fontStream = 
Thread.currentThread().getContextClassLoader().getResourceAsStream("font/Calibri.ttf");
    return PDType0Font.load(ctx.getDocument(), fontStream, true);
}
// Note that subset flag is true
{code}
Then I merge documents:
{code:java}
PDFMergerUtility.appendDocument(document1, document2);
{code}
Then  close *document2*: 
{code:java}
document2.close();
{code}
And save *document1* to _OutputStream_:
{code:java}
document1.save(someOutputStream);
{code}

*Expected results:*
I get pdf file with all fonts embedded as subset.

*Actual result:*
Font is embeded correctly only for pages created with *document1*, pages 
created with *document2* are present, but no embed font for them. 
As a result if I open created pdf file in OS which has Calibri.ttf I see 
correct font on all pages, if Calibri.ttf is not exist  font is correct only on 
pages created with *document1*.

*Used workaround:*
I see that _PdDocument_ has field:
{code:java}
// fonts to subset before saving
private final Set<PDFont> fontsToSubset = new HashSet<PDFont>();
{code}
fonts are added to this field when client call:
{code:java}
PDPageContentStream#setFont(PdFont font, float fontSize)
{code}
and actual embedding happens in method:
{code:java}
PdDocument#save(OutputStream output);
{code}
In my example above, method *save* is never called for *document2*.
We append *docuement2* to *document1* and *save* only *document1*. 

I reviewed method:
{code:java}
PDFMergerUtility#appendDocument(PDDocument destination, PDDocument source)
{code}
And I did not find that this method do something with *fontsToSubset* field.
So I create next method:    
{code:java}
@SuppressWarnings("unchecked")
private static void subsetFonts(final PDDocument document) {
    try {
        Field fontsToSubsetField = 
document.getClass().getDeclaredField("fontsToSubset");
        fontsToSubsetField.setAccessible(true);
        Set<PDFont> fontsToSubset = (Set<PDFont>) 
fontsToSubsetField.get(document);
        for (PDFont font : fontsToSubset) {
            font.subset();
        }
    } catch (NoSuchFieldException | IOException | IllegalAccessException | 
ClassCastException e) {
        LOGGER.warn("Error when subset embedded fonts into pdf document", e);
    }
}
{code}

And used it before merge documents:
{code:java}
subsetFonts(document2);
mergerUtility.appendDocument(document1, document2);
{code}
(I need to use some Reflection because *fontsToSubset* is private part of 
_PdDocument_)

I think other and maybe better option maybe:
{code:java}
docuement1.fontsToSubset.addAll(docuement2fontsToSubset);
{code}
But I did not tested this option. 

*Conclusion:*
I think this problem should be solved on library side in 
_PDFMergerUtility#appendDocument_ method, and not in client code. Or we should 
have javadoc which tells that we should use _PDFMergerUtility#appendDocument_ 
only for saved _PdDocument_


> Losing fonts (embedded subset) when merge documents with PDFMergerUtility
> -------------------------------------------------------------------------
>
>                 Key: PDFBOX-3931
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3931
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel, Utilities
>    Affects Versions: 2.0.7
>            Reporter: Nazar Dub
>
> *Story:*
> I want to merge two PdDocument with: 
> {code:java}
> PDFMergerUtility#appendDocument(PDDocument destination, PDDocument source)
> {code}
> Both documents created from scratch in java. I open _PDPageContentStream_ for 
> each document, add some text and then close _PDPageContentStream_. For each 
> document I used _PdFont_ which declared by next code:
> {code:java}
> PDFont getFont(PdDocument document) {
>     InputStream fontStream = 
> Thread.currentThread().getContextClassLoader().getResourceAsStream("font/Calibri.ttf");
>     return PDType0Font.load(ctx.getDocument(), fontStream, true);
> }
> // Note that subset flag is true
> {code}
> Then I merge documents:
> {code:java}
> PDFMergerUtility.appendDocument(document1, document2);
> {code}
> Then  close *document2*: 
> {code:java}
> document2.close();
> {code}
> And save *document1* to _OutputStream_:
> {code:java}
> document1.save(someOutputStream);
> {code}
> *Expected results:*
> I get pdf file with all fonts embedded as subset.
> *Actual result:*
> Font is embeded correctly only for pages created with *document1*, pages 
> created with *document2* are present, but no embed font for them. 
> As a result if I open created pdf file in OS which has Calibri.ttf I see 
> correct font on all pages, if Calibri.ttf is not exist then font is correct 
> only on pages created with *document1*.
> *Used workaround:*
> I see that _PdDocument_ has field:
> {code:java}
> // fonts to subset before saving
> private final Set<PDFont> fontsToSubset = new HashSet<PDFont>();
> {code}
> fonts are added to this field when client call:
> {code:java}
> PDPageContentStream#setFont(PdFont font, float fontSize)
> {code}
> and actual embedding happens in method:
> {code:java}
> PdDocument#save(OutputStream output);
> {code}
> In my example above, method *save* is never called for *document2*.
> We append *docuement2* to *document1* and *save* only *document1*. 
> I reviewed method:
> {code:java}
> PDFMergerUtility#appendDocument(PDDocument destination, PDDocument source)
> {code}
> And I did not find that this method do something with *fontsToSubset* field.
> So I create next method:    
> {code:java}
> @SuppressWarnings("unchecked")
> private static void subsetFonts(final PDDocument document) {
>     try {
>         Field fontsToSubsetField = 
> document.getClass().getDeclaredField("fontsToSubset");
>         fontsToSubsetField.setAccessible(true);
>         Set<PDFont> fontsToSubset = (Set<PDFont>) 
> fontsToSubsetField.get(document);
>         for (PDFont font : fontsToSubset) {
>             font.subset();
>         }
>     } catch (NoSuchFieldException | IOException | IllegalAccessException | 
> ClassCastException e) {
>         LOGGER.warn("Error when subset embedded fonts into pdf document", e);
>     }
> }
> {code}
> And used it before merge documents:
> {code:java}
> subsetFonts(document2);
> mergerUtility.appendDocument(document1, document2);
> {code}
> (I need to use some Reflection because *fontsToSubset* is private part of 
> _PdDocument_)
> I think other and maybe better option maybe:
> {code:java}
> docuement1.fontsToSubset.addAll(docuement2fontsToSubset);
> {code}
> But I did not tested this option. 
> *Conclusion:*
> I think this problem should be solved on library side in 
> _PDFMergerUtility#appendDocument_ method, and not in client code. Or we 
> should have javadoc which tells that we should use 
> _PDFMergerUtility#appendDocument_ only for saved _PdDocument_



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (PDFBOX-3931) Losing fonts (embedded subset) when merge documents with PDFMergerUtility

Reply via email to