[
https://issues.apache.org/jira/browse/TIKA-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18072016#comment-18072016
]
Hudson commented on TIKA-4705:
------------------------------
SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk17 #1297 (See
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk17/1297/])
TIKA-4705 -- resourceName of nested tarball should not contain the parent
directories of its parent gzip file, plus fixing typo where '.' was missing
from gz extension (#2750) (github:
[https://github.com/apache/tika/commit/cff5a735d849d3f05f8a411f9502b36b372361f7])
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pkg-module/src/main/java/org/apache/tika/parser/pkg/CompressorParser.java
* (add)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pkg-module/src/test/resources/test-documents/test-nested-tarball.tar
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/parser/RecursiveParserWrapperTest.java
> resourceName of tar file in nested tarball should not contain tarball's
> parent directories
> ------------------------------------------------------------------------------------------
>
> Key: TIKA-4705
> URL: https://issues.apache.org/jira/browse/TIKA-4705
> Project: Tika
> Issue Type: Improvement
> Reporter: Iachimoe
> Priority: Major
>
> Example structure:
> test-nested-tarball.tar contains:
> folderContainingTgz/inner/nested.tgz
>
> The resource name for nested.tgz would be
> `folderContainingTgz/inner/nested.tgz` , which is consistent with the general
> behaviour for nested archives (e.g. zips).
> However, if nested.tgz does not contain metadata specifying the name of the
> nested file within, then that file will have a resourceName of
> `folderContainingTgz/inner/nested.tar`. This is inconsistent with how other
> nested archives behave, because parent folders should are generally only
> included if they relate to the immediate parent archive. The parent archive
> of nested.tgz in this example is test-nested-tarball.tar , and that is why it
> makes sense for the folders to be included. However, the parent archive of
> nested.tar is nested.tgz , and there is no folder called folderContainingTgz
> within nested.tgz .
>
> Draft pull request with a unit test that hopefully makes the issue clear, and
> a proposed fix at https://github.com/apache/tika/pull/2730/changes
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)