[
https://issues.apache.org/jira/browse/TIKA-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18063780#comment-18063780
]
Hudson commented on TIKA-4685:
------------------------------
SUCCESS: Integrated in Jenkins build Tika ยป tika-main-jdk17 #1243 (See
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk17/1243/])
TIKA-4685 - add annotation processor for jdk >23 (#2679) (github:
[https://github.com/apache/tika/commit/b3023c47bcddc80e85e450f2801c5386859e74f8])
* (edit) tika-encoding-detectors/pom.xml
> Add a new charset detector for 4.x
> ----------------------------------
>
> Key: TIKA-4685
> URL: https://issues.apache.org/jira/browse/TIKA-4685
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
>
> While I was building out the maxent model for the updated language detector,
> I realized we had the resources (language files by language) and a maxent
> model just sitting around and ready to build a new charset detector based on
> byte ngrams.
> I have something working that appears to be quite good. We can replace both
> universal and icu4j. There's a chance that the results are hallucinated or
> that there's something surprising going on, but I think we should merge this
> and see what happens on our regression set.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)