[ 
https://issues.apache.org/jira/browse/TIKA-3703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17677364#comment-17677364
 ] 

Nick Burch commented on TIKA-3703:
----------------------------------

I guess we could include a data package metadata file to better describe the 
other files in the zip? 
[https://specs.frictionlessdata.io/data-package/#introduction]

That might make it "more standard" for people to understand what they've got 
and why

> Consider adding a frictionless data package output format
> ---------------------------------------------------------
>
>                 Key: TIKA-3703
>                 URL: https://issues.apache.org/jira/browse/TIKA-3703
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>
> For those who want more than just text and metadata, e.g. bytes for 
> thumbnails, or embedded images or embedded files or rendered pages, it would 
> be great to return that data in a standard format. Our current /unpack 
> endpoint uses a zip file but with our own "standard".
> I was thinking about heading down the pure json option by including these 
> byte streams as base64 encoded metadata values in our current metadata 
> object. Not sure which is the better way to go.
> I'm opening this issue to discuss options.
>  
> Reference: [https://frictionlessdata.io/standards/#standards-toolkit]
> We'd want to make this available as an endpoint on tika-server 
> (\{{/v2/unpack}} or something else?) and as a commandline option in tika-app.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to