Tilman Hausherr created PDFBOX-2762:
---------------------------------------

             Summary: remove parseCOSStream() call from PDFStreamParser
                 Key: PDFBOX-2762
                 URL: https://issues.apache.org/jira/browse/PDFBOX-2762
             Project: PDFBox
          Issue Type: Task
          Components: Parsing
    Affects Versions: 2.0.0
            Reporter: Tilman Hausherr
            Assignee: Tilman Hausherr
             Fix For: 2.0.0


This code is found in PDFStreamParser
{code}
                if (c == '<')
                {
                    COSDictionary pod = parseCOSDictionary();
                    skipSpaces();
                    if ((char)pdfSource.peek() == 's')
                    {
                        retval = parseCOSStream( pod );
                    }
                    else
                    {
                        retval = pod;
                    }
                }
{code}
This is incorrect. PDFStreamParser is for content streams. There are no streams 
in content streams, the spec requires "All streams shall be indirect objects". 
An "indirect object" is something between obj and endobj. But indirect objects 
are not allowed in content streams: "Indirect objects and object references 
shall not be permitted at all". So parseCOSStream() will never be called. Thus 
the new code will be
{code}
                if (c == '<')
                {
                    retval = parseCOSDictionary();
                }
{code}
To be sure, I tested my own test set and the digitalcopora set (250000 files) 
to see whether parseCOSStream is ever called in PDFStreamParser. No it isn't. 
How did this incorrect code end up there? Don't know, but it has been there 
since 2002.
http://pdfbox.cvs.sourceforge.net/viewvc/pdfbox/pdfbox/src/org/pdfbox/pdfparser/PDFStreamParser.java?revision=1.1&view=markup

Why do I care about this? It is related to a posting in a mailing list by 
Andrea Vacondio who mentioned that there are several versions of 
parseCOSStream(), so I'm trying to clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to