[jira] [Resolved] (TIKA-4195) JSoupParser conceals null from the EncodingDetector

Tim Allison (Jira) Mon, 12 Feb 2024 10:13:03 -0800


     [ 
https://issues.apache.org/jira/browse/TIKA-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tim Allison resolved TIKA-4195.
-------------------------------
    Fix Version/s: 3.0.0
       Resolution: Fixed

> JSoupParser conceals null from the EncodingDetector
> ---------------------------------------------------
>
>                 Key: TIKA-4195
>                 URL: https://issues.apache.org/jira/browse/TIKA-4195
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Minor
>             Fix For: 3.0.0
>
>
> The JSoupParser is runs encoding detection on the inputstream. If the result 
> is null, the parser applies the default charset -- US-ASCII. This behavior is 
> ok. 
> The problem is that there is no way to distinguish when a faulty encoding 
> detector alleges 'US-ASCII' and the default behavior of the JSoupParser. I 
> don't think the JSoupParser should report the fallback encoding as if it were 
> detected.
> I'm not sure how best to report this in the metadata, but we need to be able 
> to differentiate detection and fallback encoding.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (TIKA-4195) JSoupParser conceals null from the EncodingDetector

Reply via email to