[ https://issues.apache.org/jira/browse/PDFBOX-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248387#comment-13248387 ]
Adam Nichols edited comment on PDFBOX-1000 at 4/6/12 2:13 PM: -------------------------------------------------------------- I would prefer you put the changes in the ConformingPDFParser class. I'm really glad to see that work on the conforming parser is continuing even though I don't have time to contribute at the moment. The more we can combine efforts (e.g. using code from the NonSequentialPDFParser) the better. I've found that the more code is re-used, the quicker bugs are brought to light (at which point we can fix them), so I'd much rather see code re-use than copying and pasting from one class to another. was (Author: adamnichols): I would prefer you put the changes in tje ConformingPDFParser class. I'm really glad to see that work on the conforming parser is continuing even though I don't have time to contribute at the moment. The more we can combine efforts (e.g. using code from the NonSequentialPDFParser) the better. I've found that the more code is re-used, the quicker bugs are brought to light (at which point we can fix them), so I'd much rather see code re-use than copying and pasting from one class to another. > Conforming parser > ----------------- > > Key: PDFBOX-1000 > URL: https://issues.apache.org/jira/browse/PDFBOX-1000 > Project: PDFBox > Issue Type: New Feature > Components: Parsing > Reporter: Adam Nichols > Assignee: Adam Nichols > Attachments: COSUnread.java, ConformingPDDocument.java, > ConformingPDFParser.java, ConformingPDFParserTest.java, PDFLexer.java, > XrefEntry.java, conforming-parser.patch, gdb-refcard.pdf > > > A conforming parser will start at the end of the file and read backward until > it has read the EOF marker, the xref location, and trailer[1]. Once this is > read, it will read in the xref table so it can locate other objects and > revisions. This also allows skipping objects which have been rendered > obsolete (per the xref table)[2]. It also allows the minimum amount of > information to be read when the file is loaded, and then subsequent > information will be loaded if and when it is requested. This is all laid out > in the official PDF specification, ISO 32000-1:2008. > Existing code will be re-used where possible, but this will require new > classes in order to accommodate the lazy reading which is a very different > paradigm from the existing parser. Using separate classes will also > eliminate the possibility of regression bugs from making their way into the > PDDocument or BaseParser classes. Changes to existing classes will be kept > to a minimum in order to prevent regression bugs. > [1] Section 7.5.5 "Conforming readers should read a PDF file from its end" > [2] Section 7.5.4 "the entire file need not be read to locate any particular > object" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira