[ https://issues.apache.org/jira/browse/TIKA-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174130#comment-17174130 ]
chenshuming commented on TIKA-3153: ----------------------------------- seems to related to this config in tika-mimetypes.xml ( line 6163 ): {code:java} <match value="\nReceived:" type="stringignorecase" offset="0:1000"/> {code} https://github.com/apache/tika/blob/main/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml#L6163 > Text File identified as message/rfc822 > -------------------------------------- > > Key: TIKA-3153 > URL: https://issues.apache.org/jira/browse/TIKA-3153 > Project: Tika > Issue Type: Bug > Components: detector > Affects Versions: 1.24.1 > Reporter: Akash > Priority: Major > Attachments: TextFileIdentifiedAsMessage.txt > > > Text file containing the word Received: is identified as message/rfc22. > We were earlier using version 1.9 and it used to identify file type properly > as text/plain. > Even if multiple lines are there, if one line with Received: is present, > content type is incorrectly identified. > To check we can run java -jar tika-app-1.24.1.jar > TextFileIdentifiedAsMessage.txt -- This message was sent by Atlassian Jira (v8.3.4#803005)