[ 
https://issues.apache.org/jira/browse/TIKA-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18076269#comment-18076269
 ] 

ASF GitHub Bot commented on TIKA-4720:
--------------------------------------

tballison merged PR #2787:
URL: https://github.com/apache/tika/pull/2787




> Improve charset detection in 4.x, take 2
> ----------------------------------------
>
>                 Key: TIKA-4720
>                 URL: https://issues.apache.org/jira/browse/TIKA-4720
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Minor
>
> I had some really good luck with simple naive bayes with careful scaling.
>  
> This ticket includes the move to that as the main charset detector. This 
> ticket also includes work to improve our default html charset detector to get 
> some of the benefits of our StandardHtml charset detector without its 
> rigidity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to