[
https://issues.apache.org/jira/browse/TIKA-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17946283#comment-17946283
]
mannixli commented on TIKA-4398:
--------------------------------
main code, pomsĀ tika.version=3.1.0,see parsers inĀ log `meta X-TIKA:Parsed-xx`
!image-2025-04-22-11-37-15-401.png!!image-2025-04-22-11-26-09-936.png!!image-2025-04-22-11-27-33-655.png!
2025-04-22 11:33:01.359 [parser-1] INFO [trace_id=100865326431632558047]
c.b.n.a.aspect.LimitImageExtractor - embedded name [Content_Types].xml, type
null
2025-04-22 11:33:01.360 [parser-1] INFO [trace_id=100865326431632558047]
c.b.n.a.aspect.LimitImageExtractor - embedded name _rels/.rels, type null
2025-04-22 11:33:01.360 [parser-1] INFO [trace_id=100865326431632558047]
c.b.n.a.aspect.LimitImageExtractor - embedded name word/document.xml, type null
2025-04-22 11:33:01.361 [parser-1] INFO [trace_id=100865326431632558047]
c.b.n.a.aspect.LimitImageExtractor - embedded name
word/_rels/document.xml.rels, type null
2025-04-22 11:33:01.361 [parser-1] INFO [trace_id=100865326431632558047]
c.b.n.a.aspect.LimitImageExtractor - embedded name word/styles.xml, type null
2025-04-22 11:33:01.362 [parser-1] INFO [trace_id=100865326431632558047]
c.b.n.a.aspect.LimitImageExtractor - embedded name word/settings.xml, type null
2025-04-22 11:33:01.362 [parser-1] INFO [trace_id=100865326431632558047]
c.b.n.a.aspect.LimitImageExtractor - embedded name word/numbering.xml, type null
2025-04-22 11:33:01.363 [parser-1] INFO [trace_id=100865326431632558047]
c.b.n.a.aspect.LimitImageExtractor - embedded name docProps/core.xml, type null
2025-04-22 11:33:01.363 [parser-1] INFO [trace_id=100865326431632558047]
c.b.n.a.aspect.LimitImageExtractor - embedded name docProps/app.xml, type null
2025-04-22 11:33:01.363 [parser-1] INFO [trace_id=100865326431632558047]
c.b.n.a.s.impl.ParserServiceImpl - text_parse_over_success , meta:
X-TIKA:Parsed-By=org.apache.tika.parser.DefaultParser
X-TIKA:Parsed-By=org.apache.tika.parser.pkg.PackageParser
X-TIKA:Parsed-By-Full-Set=org.apache.tika.parser.DefaultParser
X-TIKA:Parsed-By-Full-Set=org.apache.tika.parser.pkg.PackageParser
resourceName=aaa.docx X-TIKA:detectedEncoding=ISO-8859-1
X-TIKA:encodingDetector=UniversalEncodingDetector
Content-Type=application/vnd.openxmlformats-officedocument.wordprocessingml.document
> When extracting a docx file with Tika 3.1.0, the package parser was detected
> instead of the OOXML parser
> --------------------------------------------------------------------------------------------------------
>
> Key: TIKA-4398
> URL: https://issues.apache.org/jira/browse/TIKA-4398
> Project: Tika
> Issue Type: Bug
> Components: tika-core
> Affects Versions: 3.1.0
> Environment: java17
> Reporter: mannixli
> Priority: Major
> Attachments: 01.docx, image-2025-04-16-20-46-07-228.png,
> image-2025-04-22-11-26-09-936.png, image-2025-04-22-11-27-33-655.png,
> image-2025-04-22-11-37-15-401.png
>
>
> 3.0.0 detected ooxml parser
--
This message was sent by Atlassian Jira
(v8.20.10#820010)