Tilman Hausherr created PDFBOX-2772:
---------------------------------------

             Summary: EI token lost for rewrite
                 Key: PDFBOX-2772
                 URL: https://issues.apache.org/jira/browse/PDFBOX-2772
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing, Writing
    Affects Versions: 1.8.9, 1.8.10, 2.0.0
            Reporter: Tilman Hausherr
            Assignee: Tilman Hausherr
             Fix For: 1.8.10, 2.0.0


>From Lukas S. in the dev mailing list:
{quote}
a co-worker and i are currently developing a service for searching and 
replacing content in pdf documents based on pdfbox. We started our project with 
the 1.8.2 version of pdfbox and just trying to migrated to 1.8.8 recently.

On changing to version 1.8.8 we are running into troubles with pdf content 
concerning inline images. Our code study of the differences between those 
versions of pdfbox led us to the handling of the EI operator as reason of our 
troubles.

In version 1.8.2 the method parseNextToken() of the 
org.apache.pdfbox.pdfparser.PDFStreamParser does an unread of the EI token on 
inline images. In newer versions this unread of the EI token doesn't exist 
anymore with the following comment "// the EI operator isn't unread, as it 
won't be processed anyway".

As a consequence the token sets of a document containing an inline image 
delivered by the PDFStreamParser can't be used to (re)render a valid pdf 
document by the ContentStreamWriter.
The reason is the missing token for the EI operator. Maybe, that the EI token 
doesn't trigger any further processing, but it is still necessary to represent 
the delimiter in the token sequence.

On the other side if a inline image should be part of a pdf page and is 
inserted as a token set manually, the EI token must also be present in the 
token set, so that the ContentStreamWriter is able to create a correct pdf 
document.

>From our point of view there are two simple approaches to get a more 
>consistent internal representation of pdf documents with pdfbox concerning 
>inline images. Either represent the EI operator as a token (revert to handling 
>in version 1.8.2.) explicitly or extend the writeObject method in the 
>ContentStreamWriter to append the EI operator implicitly. 
{quote}
THAT is what I call an excellent bug report :-) I think that the 2nd solution 
you suggested is the better one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to