[ https://issues.apache.org/jira/browse/TIKA-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17220030#comment-17220030 ]
Peter Lee commented on TIKA-3213: --------------------------------- This fork repository don't support Chinese charset detect since version 2.0.0. See this issue : [https://github.com/albfernandez/juniversalchardet/issues/34] It might be a problem. > Consider migrating universalcharsetdetector to a live fork > ---------------------------------------------------------- > > Key: TIKA-3213 > URL: https://issues.apache.org/jira/browse/TIKA-3213 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Major > > I just came across this living fork of the aged juniversalchardet (2011!!!): > https://github.com/albfernandez/juniversalchardet > It has a mozilla license, has decent star count and is published on maven > central. > Obv, we'll want to run a comparison on our corpus before making this change, > but I wanted to open this issue for discussion. -- This message was sent by Atlassian Jira (v8.3.4#803005)