[
https://issues.apache.org/jira/browse/PDFBOX-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145344#comment-13145344
]
Maruan Sahyoun commented on PDFBOX-1000:
----------------------------------------
Hi Adam,
I'm looking into putting some work into the conforming parser. But first let me
ask some questions:
# what are the main areas you were trying to address? To me the most pressing
need was the correct Xref resolution but that has been solved
# is there some further work you put into this you would like to post before
there are any changes?
Kind regards
Maruan
> Conforming parser
> -----------------
>
> Key: PDFBOX-1000
> URL: https://issues.apache.org/jira/browse/PDFBOX-1000
> Project: PDFBox
> Issue Type: New Feature
> Components: Parsing
> Reporter: Adam Nichols
> Assignee: Adam Nichols
> Attachments: COSUnread.java, ConformingPDDocument.java,
> ConformingPDFParser.java, ConformingPDFParserTest.java, XrefEntry.java,
> conforming-parser.patch, gdb-refcard.pdf
>
>
> A conforming parser will start at the end of the file and read backward until
> it has read the EOF marker, the xref location, and trailer[1]. Once this is
> read, it will read in the xref table so it can locate other objects and
> revisions. This also allows skipping objects which have been rendered
> obsolete (per the xref table)[2]. It also allows the minimum amount of
> information to be read when the file is loaded, and then subsequent
> information will be loaded if and when it is requested. This is all laid out
> in the official PDF specification, ISO 32000-1:2008.
> Existing code will be re-used where possible, but this will require new
> classes in order to accommodate the lazy reading which is a very different
> paradigm from the existing parser. Using separate classes will also
> eliminate the possibility of regression bugs from making their way into the
> PDDocument or BaseParser classes. Changes to existing classes will be kept
> to a minimum in order to prevent regression bugs.
> [1] Section 7.5.5 "Conforming readers should read a PDF file from its end"
> [2] Section 7.5.4 "the entire file need not be read to locate any particular
> object"
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira