Re: inline images – EI operator

Tilman Hausherr Wed, 22 Apr 2015 10:23:37 -0700

Hi Lukas,

Done. A snapshot will be available within a few hours here:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/1.8.10-SNAPSHOT/


Please test and confirm that it works for you.

About your second question - I have no opinion about this.... The bestwould be that you open an issue in JIRA, and explain

- what you need it for
- whether reading or writing

"exposing our privates" is always a controversial topic here :-)

Tilman

Am 22.04.2015 um 18:56 schrieb Tilman Hausherr:

Hi Lukas,
Thanks for your detailed analysis. It's my fault. (Seehttps://issues.apache.org/jira/browse/PDFBOX-1794 ). I think that the2nd solution you suggested is the better one. I've openedhttps://issues.apache.org/jira/browse/PDFBOX-2772 and will work onthis soon.
Tilman



Am 22.04.2015 um 17:26 schrieb Lukas Schober:
Dear pdfbox-devs,
a co-worker and i are currently developing a service for searchingand replacing content in pdf documents based on pdfbox. We startedour project with the 1.8.2 version of pdfbox and just trying tomigrated to 1.8.8 recently.
On changing to version 1.8.8 we are running into troubles with pdfcontent concerning inline images. Our code study of the differencesbetween those versions of pdfbox led us to the handling of the EIoperator as reason of our troubles.
In version 1.8.2 the method parseNextToken() of theorg.apache.pdfbox.pdfparser.PDFStreamParser does an unread of the EItoken on inline images. In newer versions this unread of the EI tokendoesn't exist anymore with the following comment “// the EI operatorisn't unread, as it won't be processed anyway”.
As a consequence the token sets of a document containing an inlineimage delivered by the PDFStreamParser can't be used to (re)render avalid pdf document by the ContentStreamWriter.The reason is the missing token for the EI operator. Maybe, that theEI token doesn't trigger any further processing, but it is stillnecessary to represent the delimiter in the token sequence.
On the other side if a inline image should be part of a pdf page andis inserted as a token set manually, the EI token must also bepresent in the token set, so that the ContentStreamWriter is able tocreate a correct pdf document.
From our point of view there are two simple approaches to get a moreconsistent internal representation of pdf documents with pdfboxconcerning inline images. Either represent the EI operator as a token(revert to handling in version 1.8.2.) explicitly or extend thewriteObject method in the ContentStreamWriter to append the EIoperator implicitly.
Furthermore in our specialization of the PDFTextStripper, the abilityto access the base-class properties from there was a limiting factor.Are there some reasons that the properties
org.apache.pdfbox.util.PDFTextStripper::startBookmarkPageNumber
org.apache.pdfbox.util.PDFTextStripper::endBookmarkPageNumber
org.apache.pdfbox.util.PDFTextStripper::pageArticles
org.apache.pdfbox.util.PDFTextStripper::characterListMapping
org.apache.pdfbox.util.PDFStreamEngine::streamResourcesStack
org.apache.pdfbox.util.PDFStreamEngine::page
are really necessary to be private, or is it enough restrictive tobe protected so that they can be accessed in derived classes?
Best regards,
Lukas Schober


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: inline images – EI operator

Reply via email to