[
https://issues.apache.org/jira/browse/PDFBOX-183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Justin LeFebvre updated PDFBOX-183:
-----------------------------------
Attachment: ConflictList.diff
The problem with the file referenced in this bug report is that the pdf file
has multiple entries for certain objects. Before, when pdfbox was parsing
through the file, if it encountered duplicate objects, the new object would
completely replace the old one. While the spec says that files should not have
this issue, most readers don't have a problem rendering the file because they
use the information from the xref table to figure out which objects to use. In
order to fix this, we added code that deals with these conflicts. When parsing,
if we see a second instance of an object, we put that instance, its key, and
its byte offset into a new conflictList of ConflictObjs. After we're done
parsing the rest of the file, we now have xref information and use the byte
offsets to determine if this current object should replace the object we saw
originally or not.
> java.lang.NullPointerException in highlighter.generateXMLHig
> ------------------------------------------------------------
>
> Key: PDFBOX-183
> URL: https://issues.apache.org/jira/browse/PDFBOX-183
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Priority: Minor
> Attachments: ConflictList.diff
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1517476
> Originally submitted by nobody on 2006-07-05 05:11.
> Sample code :
> try
> {
> URL pdfURL = new URL( mPdfUrl );
>
> doc = PDDocument.load( pdfURL.openStream() );
> PDFHighlighter highlighter = new PDFHighlighter();
> highlighter.generateXMLHighlight( doc,
> mHighlightWords.split( " " ), fiw );
>
> }
> catch (Exception e)
> Using ADLIB converted PDF ( see attach file )
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1517476&file_id=183934
> cv1.pdf (application/pdf), 109109 bytes
> pdf
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.