[
https://issues.apache.org/jira/browse/TIKA-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17946411#comment-17946411
]
mannixli commented on TIKA-4398:
--------------------------------
I used your code, output is:
Extracted? yes
X-TIKA:Parsed-By: [org.apache.tika.parser.DefaultParser,
org.apache.tika.parser.pkg.PackageParser]
X-TIKA:Parsed-By-Full-Set: [org.apache.tika.parser.DefaultParser,
org.apache.tika.parser.pkg.PackageParser,
org.apache.tika.parser.xml.DcXMLParser,
org.apache.tika.parser.image.ImageParser]
X-TIKA:detectedEncoding: ISO-8859-1
X-TIKA:encodingDetector: UniversalEncodingDetector
Content-Type: application/zip
:(
> When extracting a docx file with Tika 3.1.0, the package parser was detected
> instead of the OOXML parser
> --------------------------------------------------------------------------------------------------------
>
> Key: TIKA-4398
> URL: https://issues.apache.org/jira/browse/TIKA-4398
> Project: Tika
> Issue Type: Bug
> Components: tika-core
> Affects Versions: 3.1.0
> Environment: java17
> Reporter: mannixli
> Priority: Major
> Attachments: 01.docx, image-2025-04-16-20-46-07-228.png,
> image-2025-04-22-11-26-09-936.png, image-2025-04-22-11-27-33-655.png,
> image-2025-04-22-11-37-15-401.png
>
>
> 3.0.0 detected ooxml parser
--
This message was sent by Atlassian Jira
(v8.20.10#820010)