[ 
https://issues.apache.org/jira/browse/TIKA-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728020#comment-17728020
 ] 

Tim Allison commented on TIKA-4059:
-----------------------------------

Are there any other formats that are typically gzipped?

> Consider parsing common gzipped formats like we do with package files
> ---------------------------------------------------------------------
>
>                 Key: TIKA-4059
>                 URL: https://issues.apache.org/jira/browse/TIKA-4059
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Major
>
> For docx and zip-based formats, we have a zip detector and we parse those 
> container files as a single file.  There are a handful of file formats that 
> are often gzipped: tgz, svgz and warc files.
> Users currently get the content of these files as an attachment to the main 
> gzipped file with /rmeta or the -J option in tika-app.
> This issue proposes adding a simple gzip container detector to treat these 
> file formats as a single file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to