[ 
https://issues.apache.org/jira/browse/PDFBOX-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17303522#comment-17303522
 ] 

Tim Allison commented on PDFBOX-5128:
-------------------------------------

The process hasn't finished, but I'm dumping the files here:

[https://corpora.tika.apache.org/base/xmps/]

I'm roughly binning them by the file type of the container file, including: 
[https://corpora.tika.apache.org/base/xmps/pdf/] 

 

Let me know if I can do any processing on these or if I botched the extraction.

 

> Support parsing non standardized XMP 
> -------------------------------------
>
>                 Key: PDFBOX-5128
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5128
>             Project: PDFBox
>          Issue Type: Task
>          Components: XmpBox
>            Reporter: Maruan Sahyoun
>            Assignee: Maruan Sahyoun
>            Priority: Major
>         Attachments: PDFBOX.zip, image-2021-03-17-09-00-57-653.png
>
>
> XMP currently only supports parsing known XMP schema as has been discussed. 
> That shall be extended to support arbitrary but valid  XMP.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to