[ https://issues.apache.org/jira/browse/TIKA-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500143#comment-17500143 ]
Naama Hophstatder commented on TIKA-3684: ----------------------------------------- I see the results of the /rmeta endpoint, understand where the issue comes from, but as far as I understand the emf/wmf attachments has no "text" meaning in this situation, so I want to disable the related parsers. Can you give me an example of how can I turn them off? And if it changes while working in docker container mode? Thanks in advance. > Extract text returns the text multiple times > -------------------------------------------- > > Key: TIKA-3684 > URL: https://issues.apache.org/jira/browse/TIKA-3684 > Project: Tika > Issue Type: Bug > Components: docker > Affects Versions: 2.1.0 > Reporter: Naama Hophstatder > Priority: Major > Attachments: example.docx, example.json > > > We are using tika docker container as a linux service, when I want to extract > text from a word document, e.g.: > curl -T example.docx http://localhost:9998/tika --header "Accept: text/plain" > we get the text 3 times. > Notice: We also have tika server v1.14, and this version returns the text > just as expected. -- This message was sent by Atlassian Jira (v8.20.1#820001)