Tim Allison created TIKA-4195: --------------------------------- Summary: JSoupParser conceals null from the EncodingDetector Key: TIKA-4195 URL: https://issues.apache.org/jira/browse/TIKA-4195 Project: Tika Issue Type: Improvement Reporter: Tim Allison
The JSoupParser is runs encoding detection on the inputstream. If the result is null, the parser applies the default charset -- US-ASCII. This behavior is ok. The problem is that there is no way to distinguish when a faulty encoding detector alleges 'US-ASCII' and the default behavior of the JSoupParser. I don't think the JSoupParser should report the fallback encoding as if it were detected. I'm not sure how best to report this in the metadata, but we need to be able to differentiate detection and fallback encoding. -- This message was sent by Atlassian Jira (v8.20.10#820010)