Petras created PDFBOX-3321:
------------------------------

             Summary: ASCII stream data size is increased when written
                 Key: PDFBOX-3321
                 URL: https://issues.apache.org/jira/browse/PDFBOX-3321
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing
    Affects Versions: 1.8.11
            Reporter: Petras
            Priority: Critical


This bug is quite complicated and was discovered when visual signatures were 
used along with parsing of the document with Preflight before signing. 

I dig a bit trying to investigate this bug nature as the bug does not appear 
regularly. It appears that it manifests itself under such conditions:
# Document is parsed when opened (ex. by Preflight) and entry with number value 
is detected, which is marked as direct by 
_BaseParser.parseCOSDictionary(BaseParser.java:381)_;
# Stream with ASCII filter is created or present in document having the same 
length as the number found in step 1 (ex. when visual signature is created by 
calling _SignatureOptions#setVisualSignature()_);
# While written _COSWriter_ checks the stream length by its _direct_ property. 
If */Length* is present and is flaged as direct, it is not recalculated when 
written.

As a result, when doucument is written, the stream length is changed: written 
stream is increased by 2 bytes (CRLF is added by _COSWriter_), while */Length* 
entry still indicate the original length. That violates PDF requirements for 
the */Length* entry:
bq. The number of bytes from the beginning of the line following the keyword 
*stream* to the last byte just before the keyword *endstream*. (There may be an 
additional EOL marker, preceding *endstream*, that is not included in the count 
and is not logically part of the stream data.)

These bugs complement to this effect:
* PDFBOX-3320 & PDFBOX-2685, as number used for stream length is marked as 
direct;
* _BaseParser.parseCOSStream(BaseParser.java:490)_ parses ASCII stream using 
_EndstreamOutputStream_ class, which always includes all characters till the 
*endstream* keyword, though CRLF preceding *endstream* is not part of the 
stream data;
* _COSWriter_ checks the stream length by its _direct_ property, even though it 
could be set as indirect via _COSObject_. As it is flaged as direct due to 
mutability of cached COSNumber, the stream length is not recalculated.

As _COSWriter_ always adds CRLF at the end of the stream, the final stream data 
increased by 2 bytes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to