[ https://issues.apache.org/jira/browse/TIKA-3703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17677353#comment-17677353 ]
Tim Allison commented on TIKA-3703: ----------------------------------- Users can compress the output with an "accepts" header, but you're right...why... We don't want to add an extra step, and a zip is simpler. Thank you, Nick! > Consider adding a frictionless data package output format > --------------------------------------------------------- > > Key: TIKA-3703 > URL: https://issues.apache.org/jira/browse/TIKA-3703 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Major > > For those who want more than just text and metadata, e.g. bytes for > thumbnails, or embedded images or embedded files or rendered pages, it would > be great to return that data in a standard format. Our current /unpack > endpoint uses a zip file but with our own "standard". > I was thinking about heading down the pure json option by including these > byte streams as base64 encoded metadata values in our current metadata > object. Not sure which is the better way to go. > I'm opening this issue to discuss options. > > Reference: [https://frictionlessdata.io/standards/#standards-toolkit] > We'd want to make this available as an endpoint on tika-server > (\{{/v2/unpack}} or something else?) and as a commandline option in tika-app. -- This message was sent by Atlassian Jira (v8.20.10#820010)