[ 
https://issues.apache.org/jira/browse/PDFBOX-3321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15248223#comment-15248223
 ] 

Tilman Hausherr commented on PDFBOX-3321:
-----------------------------------------

I'd prefer to have some test that shows the problem. The EndstreamOutputStream 
class is probably one of the weirder pieces of code I've written, but there 
were reasons for it... see the long comment at the beginning. Maybe we can get 
around this by compressing that stream? I'd prefer not to open that can of 
worms again.

> ASCII stream data size is increased when written
> ------------------------------------------------
>
>                 Key: PDFBOX-3321
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3321
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.8.11
>            Reporter: Petras
>            Priority: Critical
>              Labels: signature, streams
>
> This bug is quite complicated and was discovered when visual signatures were 
> used along with parsing of the document with Preflight before signing. 
> I dig a bit trying to investigate this bug nature as the bug does not appear 
> regularly. It appears that it manifests itself under such conditions:
> # Document is parsed when opened (ex. by Preflight) and entry with number 
> value is detected, which is marked as direct by 
> _BaseParser.parseCOSDictionary(BaseParser.java:381)_;
> # Stream with ASCII filter is created or present in document having the same 
> length as the number found in step 1 (ex. when visual signature is created by 
> calling _SignatureOptions#setVisualSignature()_);
> # While written _COSWriter_ checks the stream length by its _direct_ 
> property. If */Length* is present and is flaged as direct, it is not 
> recalculated when written.
> As a result, when doucument is written, the stream length is changed: written 
> stream is increased by 2 bytes, while */Length* entry still indicate the 
> original length. That violates PDF requirements for the */Length* entry:
> bq. The number of bytes from the beginning of the line following the keyword 
> *stream* to the last byte just before the keyword *endstream*. (There may be 
> an additional EOL marker, preceding *endstream*, that is not included in the 
> count and is not logically part of the stream data.)
> These bugs complement to this effect:
> * PDFBOX-3320 & PDFBOX-2685, as number used for stream length is marked as 
> direct;
> * _BaseParser.parseCOSStream(BaseParser.java:490)_ parses ASCII stream using 
> _EndstreamOutputStream_ class, which always includes all characters till the 
> *endstream* keyword, though CRLF preceding *endstream* is not part of the 
> stream data;
> * _COSWriter_ checks the stream length by its _direct_ property, even though 
> it could be set as indirect via _COSObject_. As it is flaged as direct due to 
> mutability of cached COSNumber, the stream length is not recalculated.
> As _COSWriter_ always adds CRLF at the end of the stream, the final stream 
> data increased by 2 bytes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to