[ https://issues.apache.org/jira/browse/PDFBOX-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764228#comment-17764228 ]
Tim Allison edited comment on PDFBOX-5682 at 9/12/23 2:41 PM: -------------------------------------------------------------- This is the part from that document that is, erm, eye-opening: {noformat} 4.2 AF entry not in the catalog 4.2.1 General Most existing applications that take advantage of Associated Files use the AF entry in the document catalog as the place to make the association. However, the concept of Associated Files goes well beyond association only with the file as a whole, and also allows for defining relations between embedded files and certain pages, annotations, form fields, graphics objects, structure elements in the tagging structure, DParts or any other PDF object. {noformat} And, yes, the document goes on to say, PDF writers should do the traditional thing, but... was (Author: talli...@mitre.org): This is the part from that document that is, erm, eye-opening: {noformat} 4.2 AF entry not in the catalog 4.2.1 General Most existing applications that take advantage of Associated Files use the AF entry in the document catalog as the place to make the association. However, the concept of Associated Files goes well beyond association only with the file as a whole, and also allows for defining relations between embedded files and certain pages, annotations, form fields, graphics objects, structure elements in the tagging structure, DParts or any other PDF object. {noformat} > Long/permanent hang in PDFBox 3.x > --------------------------------- > > Key: PDFBOX-5682 > URL: https://issues.apache.org/jira/browse/PDFBOX-5682 > Project: PDFBox > Issue Type: Bug > Reporter: Tim Allison > Priority: Minor > > I found two files in the regression tests where we're now getting timeouts at > 3 minutes where we weren't before. Unfortunately, PDFBox's export:text works > on both, so it is probably another structural feature, perhaps a problem in > Tika? > This file halts after printing out the header for Table 19 on page 46: > https://corpora.tika.apache.org/base/docs/govdocs1/078/078656.pdf > Pure PDFBox's export:text complains multiple times: "Page skipped due to an > invalid or missing type null, but it does finish quickly." > This file halts after extracting {{"854,793,592"}}: > https://corpora.tika.apache.org/base/docs/commoncrawl3_refetched/G7/G7BO7PNCCREVF2BCY5YSYOPYDLMBYASY > Pure PDFBox's export:text processes this without problem. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org