[
https://issues.apache.org/jira/browse/PDFBOX-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17303522#comment-17303522
]
Tim Allison commented on PDFBOX-5128:
-------------------------------------
The process hasn't finished, but I'm dumping the files here:
[https://corpora.tika.apache.org/base/xmps/]
I'm roughly binning them by the file type of the container file, including:
[https://corpora.tika.apache.org/base/xmps/pdf/]
Let me know if I can do any processing on these or if I botched the extraction.
> Support parsing non standardized XMP
> -------------------------------------
>
> Key: PDFBOX-5128
> URL: https://issues.apache.org/jira/browse/PDFBOX-5128
> Project: PDFBox
> Issue Type: Task
> Components: XmpBox
> Reporter: Maruan Sahyoun
> Assignee: Maruan Sahyoun
> Priority: Major
> Attachments: PDFBOX.zip, image-2021-03-17-09-00-57-653.png
>
>
> XMP currently only supports parsing known XMP schema as has been discussed.
> That shall be extended to support arbitrary but valid XMP.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]