Carol Alexandru created TIKA-4408:
-------------------------------------
Summary: python file identified as application/x-sh under several
circumstances
Key: TIKA-4408
URL: https://issues.apache.org/jira/browse/TIKA-4408
Project: Tika
Issue Type: Bug
Components: core
Reporter: Carol Alexandru
The [definition for text/x-python inĀ
tika-mimetypes.xml|https://github.com/apache/tika/blob/25619272d2f615df4ad87e27e7c8dec576f37627/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml#L8347]
is missing some matches. In particular
* #!{+}/usr{+}/bin/env ...
* all variants using python{+}3{+} instead of python
For this reason, a file starting with any of the following valid and fairly
common lines are misidentified as application/x-sh (which matches #!/)
{{#!/usr/bin/env python3}}
{{#!/usr/bin/env python}}
{{{}#!{}}}{{{}/usr/bin/python3{}}}
{{... etc ...}}
I might do a pull request if I get around to it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)