[ 
https://issues.apache.org/jira/browse/PDFBOX-384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler closed PDFBOX-384.
-------------------------------------
    Resolution: Won't Fix
      Assignee: Andreas Lehmkühler

Closed as I guess some of the ideas are already implemented.

> sometimes, when PDFBox writes stream's content in a PDF file, it can no 
> longer read it
> --------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-384
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-384
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Writing
>    Affects Versions: 0.7.3
>         Environment: pdfbox 0.73, java 5, windows os
>            Reporter: Son
>            Assignee: Andreas Lehmkühler
>         Attachments: COSStream.java, COSWriter.java
>
>
> the stream content writing of PDFBox  creates a Length entry in the stream's 
> directory that is an indirect reference.
> the specification states (extracted from pdf reference 1.5, but also valid 
> for all reference guide since), section 3.2.7 Stream Objects:
> ...
> stream consists of a dictionary that describes a sequence of bytes, followed 
> by
> zero or more bytes bracketed between the keywords stream and endstream: 
> dictionary
> stream
> ...Zero or more bytes...
> endstream
> All streams must be indirect objects (see Section 3.2.9, "Indirect Objects") 
> and
> the stream dictionary must be a direct object. The keyword stream  that 
> follows
> the stream dictionary should be followed by an end-of-line marker...
> the stream dictionary must be direct. what is not state is that entries in 
> the dictionary should be direct as well as .... later on, it says in the 
> Stream Extent paragraph:
> ...
> Every stream dictionary has a Length entry that indicates how many bytes of 
> the
> PDF file are used for the stream's data. (If the stream has a filter, Length  
> is the
> number of bytes of encoded data.) In addition, most filters are defined so 
> that the
> data is self-limiting; that is, they use  an encoding scheme  in which an 
> explicit
> end-of-data  (EOD) marker delimits the extent of the data. Finally, streams 
> are
> used to represent many objects from whose attributes a length can be 
> inferred. All
> of these constraints must be consistent. 
> ...
> It indicates that most filters handles self-delimiting data ... thereby not 
> requiring all filtering algorithm to support so.
> So, in order to explicitly set the Length value inside the stream dictionary, 
> the filtering of content should be made prior to writing the dictionary.
> The current PDFBox behavior does the following:
> (see org.pdfbox.pdfwriter.COSWriter.visitFromStream(COSStream obj) at line 
> 929:
> ...
>             InputStream input = obj.getFilteredStream();
>             // set the length of the stream and write stream dictionary
>             COSObject lengthObject = new COSObject( null );
>             
>             obj.setItem(COSName.LENGTH, lengthObject);
>             // write the stream content
>             visitFromDictionary( obj );
>             getStandardOutput().write(STREAM);
> ...
>             // writes the content
> ...
>             lengthObject.setObject( new COSInteger( totalAmountWritten ) );
>             getStandardOutput().writeCRLF();
>             getStandardOutput().write(ENDSTREAM);
>             getStandardOutput().writeEOL();
>             return null;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to