[ https://issues.apache.org/jira/browse/TIKA-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison resolved TIKA-4195. ------------------------------- Fix Version/s: 3.0.0 Resolution: Fixed > JSoupParser conceals null from the EncodingDetector > --------------------------------------------------- > > Key: TIKA-4195 > URL: https://issues.apache.org/jira/browse/TIKA-4195 > Project: Tika > Issue Type: Improvement > Reporter: Tim Allison > Priority: Minor > Fix For: 3.0.0 > > > The JSoupParser is runs encoding detection on the inputstream. If the result > is null, the parser applies the default charset -- US-ASCII. This behavior is > ok. > The problem is that there is no way to distinguish when a faulty encoding > detector alleges 'US-ASCII' and the default behavior of the JSoupParser. I > don't think the JSoupParser should report the fallback encoding as if it were > detected. > I'm not sure how best to report this in the metadata, but we need to be able > to differentiate detection and fallback encoding. -- This message was sent by Atlassian Jira (v8.20.10#820010)