[ 
https://issues.apache.org/jira/browse/PDFBOX-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764228#comment-17764228
 ] 

Tim Allison edited comment on PDFBOX-5682 at 9/12/23 2:41 PM:
--------------------------------------------------------------

This is the part from that document that is, erm, eye-opening:

{noformat}
4.2 AF entry not in the catalog
4.2.1 General
Most existing applications that take advantage of Associated Files use the AF 
entry in the
document catalog as the place to make the association. However, the concept of
Associated Files goes well beyond association only with the file as a whole, 
and also
allows for defining relations between embedded files and certain pages, 
annotations,
form fields, graphics objects, structure elements in the tagging structure, 
DParts or any
other PDF object.
{noformat}

And, yes, the document goes on to say, PDF writers should do the traditional 
thing, but...



was (Author: talli...@mitre.org):
This is the part from that document that is, erm, eye-opening:

{noformat}
4.2 AF entry not in the catalog
4.2.1 General
Most existing applications that take advantage of Associated Files use the AF 
entry in the
document catalog as the place to make the association. However, the concept of
Associated Files goes well beyond association only with the file as a whole, 
and also
allows for defining relations between embedded files and certain pages, 
annotations,
form fields, graphics objects, structure elements in the tagging structure, 
DParts or any
other PDF object.
{noformat}

> Long/permanent hang in PDFBox 3.x
> ---------------------------------
>
>                 Key: PDFBOX-5682
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5682
>             Project: PDFBox
>          Issue Type: Bug
>            Reporter: Tim Allison
>            Priority: Minor
>
> I found two files in the regression tests where we're now getting timeouts at 
> 3 minutes where we weren't before.  Unfortunately, PDFBox's export:text works 
> on both, so it is probably another structural feature, perhaps a problem in 
> Tika?
> This file halts after printing out the header for Table 19 on page 46: 
> https://corpora.tika.apache.org/base/docs/govdocs1/078/078656.pdf
> Pure PDFBox's export:text complains multiple times: "Page skipped due to an 
> invalid or missing type null, but it does finish quickly."
> This file halts after extracting {{"854,793,592"}}: 
> https://corpora.tika.apache.org/base/docs/commoncrawl3_refetched/G7/G7BO7PNCCREVF2BCY5YSYOPYDLMBYASY
> Pure PDFBox's export:text processes this without problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to