[ 
https://issues.apache.org/jira/browse/PDFBOX-1263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Ringer updated PDFBOX-1263:
---------------------------------

    Description: 
The attached patch reworks the handling of content stream rewriting for 
resource dictionary clash avoidance in Overlay.java .

Prior to this patch, Overlay appends "overlay" to all names in the Font, 
XObject and ExGState resource dictionaries, then rewrites content stream(s) in 
the overlay PDF to reference those new names using a simple hand-rolled 
content-stream find-and-replace process. It doesn't check for over-length 
names, and it doesn't check to make sure that the newly generated name(s) don't 
clash. Because PDFs often use the same names for objects, this quickly becomes 
a problem when you're doing multiple overlays - something that becomes more 
likely with https://issues.apache.org/jira/browse/PDFBOX-1255 but is already 
useful to do with stock PDFBox.

This patch alters Overlay so that it only renames objects from the overlay PDF 
when there is a conflict with the PDF being overlaid upon. It also uses a name 
generation strategy that checks for conflicts and for over-length names, so 
multiple overlays will work much better. The patch uses the PDFStreamProcessor 
(a simplified base extracted from PDFStreamEngine by 
https://issues.apache.org/jira/browse/PDFBOX-1256) to copy each stream from the 
PDF to overlay to a ContentStreamWriter. It checks for names that reference 
renamed resources and substitutes the new name before writing each operator and 
its arguments to the output stream.

The main benefit of this patch is that it enables multiple overlays without 
name clashes.

A secondary benefit of this patch is that it eliminates Overlay.java -specific 
code in favour of using facilities provided by the rest of PDFBox. That makes 
Overlay a better example, helps it test the rest of PDFBox better, and makes it 
benefit from improvements in PDFBox's stream processor and writer.

  was:
The attached patch reworks the handling of content stream rewriting for 
resource dictionary clash avoidance in Overlay.java .

Prior to this patch, Overlay appends "overlay" to all names in the Font, 
XObject and ExGState resource dictionaries, then rewrites content stream(s) in 
the overlay PDF to reference those new names using a simple hand-rolled 
content-stream find-and-replace process. It doesn't check for over-length 
names, and it doesn't check to make sure that the newly generated name(s) don't 
clash. Because PDFs often use the same names for objects, this quickly becomes 
a problem when you're doing multiple overlays - something that becomes more 
likely with https://issues.apache.org/jira/browse/PDFBOX-1255 but is already 
useful to do with stock PDFBox.

This patch alters Overlay so that it only renames objects from the overlay PDF 
when there is a conflict with the PDF being overlaid upon. It also uses a name 
generation strategy that checks for conflicts and for over-length names, so 
multiple overlays will work much better. The patch uses the PDFStreamProcessor 
(a simplified base extracted from PDFStreamEngine by 
https://issues.apache.org/jira/browse/PDFBOX-1256) to copy each stream from the 
PDF to overlay to a PDFStreamWriter. It checks for names that reference renamed 
resources and substitutes the new name before writing each operator and its 
arguments to the output stream.

The main benefit of this patch is that it enables multiple overlays without 
name clashes.

A secondary benefit of this patch is that it eliminates Overlay.java -specific 
code in favour of using facilities provided by the rest of PDFBox. That makes 
Overlay a better example, helps it test the rest of PDFBox better, and makes it 
benefit from improvements in PDFBox's stream processor and writer.

    
> [PATCH] Rewrite Overlay.java's stream rewriting and rsrc dict renaming to use 
> PDFStreamProcessor
> ------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1263
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1263
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Utilities
>    Affects Versions: 1.7.0
>         Environment: N/A
>            Reporter: Craig Ringer
>            Priority: Minor
>              Labels: newbie, overlay, patch, refactoring
>         Attachments: 
> 0003-Major-rework-of-Overlay.java-to-use-PDFStreamProcess.patch
>
>
> The attached patch reworks the handling of content stream rewriting for 
> resource dictionary clash avoidance in Overlay.java .
> Prior to this patch, Overlay appends "overlay" to all names in the Font, 
> XObject and ExGState resource dictionaries, then rewrites content stream(s) 
> in the overlay PDF to reference those new names using a simple hand-rolled 
> content-stream find-and-replace process. It doesn't check for over-length 
> names, and it doesn't check to make sure that the newly generated name(s) 
> don't clash. Because PDFs often use the same names for objects, this quickly 
> becomes a problem when you're doing multiple overlays - something that 
> becomes more likely with https://issues.apache.org/jira/browse/PDFBOX-1255 
> but is already useful to do with stock PDFBox.
> This patch alters Overlay so that it only renames objects from the overlay 
> PDF when there is a conflict with the PDF being overlaid upon. It also uses a 
> name generation strategy that checks for conflicts and for over-length names, 
> so multiple overlays will work much better. The patch uses the 
> PDFStreamProcessor (a simplified base extracted from PDFStreamEngine by 
> https://issues.apache.org/jira/browse/PDFBOX-1256) to copy each stream from 
> the PDF to overlay to a ContentStreamWriter. It checks for names that 
> reference renamed resources and substitutes the new name before writing each 
> operator and its arguments to the output stream.
> The main benefit of this patch is that it enables multiple overlays without 
> name clashes.
> A secondary benefit of this patch is that it eliminates Overlay.java 
> -specific code in favour of using facilities provided by the rest of PDFBox. 
> That makes Overlay a better example, helps it test the rest of PDFBox better, 
> and makes it benefit from improvements in PDFBox's stream processor and 
> writer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to