[
https://issues.apache.org/jira/browse/TIKA-3703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17677364#comment-17677364
]
Nick Burch commented on TIKA-3703:
----------------------------------
I guess we could include a data package metadata file to better describe the
other files in the zip?
[https://specs.frictionlessdata.io/data-package/#introduction]
That might make it "more standard" for people to understand what they've got
and why
> Consider adding a frictionless data package output format
> ---------------------------------------------------------
>
> Key: TIKA-3703
> URL: https://issues.apache.org/jira/browse/TIKA-3703
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
>
> For those who want more than just text and metadata, e.g. bytes for
> thumbnails, or embedded images or embedded files or rendered pages, it would
> be great to return that data in a standard format. Our current /unpack
> endpoint uses a zip file but with our own "standard".
> I was thinking about heading down the pure json option by including these
> byte streams as base64 encoded metadata values in our current metadata
> object. Not sure which is the better way to go.
> I'm opening this issue to discuss options.
>
> Reference: [https://frictionlessdata.io/standards/#standards-toolkit]
> We'd want to make this available as an endpoint on tika-server
> (\{{/v2/unpack}} or something else?) and as a commandline option in tika-app.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)