[
https://issues.apache.org/jira/browse/TIKA-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18075848#comment-18075848
]
ASF GitHub Bot commented on TIKA-4720:
--------------------------------------
tballison merged PR #2786:
URL: https://github.com/apache/tika/pull/2786
> Improve charset detection in 4.x, take 2
> ----------------------------------------
>
> Key: TIKA-4720
> URL: https://issues.apache.org/jira/browse/TIKA-4720
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Minor
>
> I had some really good luck with simple naive bayes with careful scaling.
>
> This ticket includes the move to that as the main charset detector. This
> ticket also includes work to improve our default html charset detector to get
> some of the benefits of our StandardHtml charset detector without its
> rigidity.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)