Ray Gauss II created TIKA-965: --------------------------------- Summary: Text Detection Fails on Mostly Non-ASCII UTF-8 Files Key: TIKA-965 URL: https://issues.apache.org/jira/browse/TIKA-965 Project: Tika Issue Type: Bug Components: general Affects Versions: 1.2 Reporter: Ray Gauss II
If a file contains relatively few ASCII characters and more 8 bit UTF-8 characters the TextDetector and TextStatistics classes fail to detect it as text. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira