[
https://issues.apache.org/jira/browse/TIKA-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18065835#comment-18065835
]
ASF GitHub Bot commented on TIKA-4690:
--------------------------------------
tballison merged PR #2693:
URL: https://github.com/apache/tika/pull/2693
> Add generative language model in 4.x
> ------------------------------------
>
> Key: TIKA-4690
> URL: https://issues.apache.org/jira/browse/TIKA-4690
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Minor
>
> Finally realized that we can play all we want with logits from the language
> detector, but it is not a great approach for "languagey/junk" detection. On
> this ticket, we'll add a generative model trained on the same languages as
> the language detector so that we can get a better sense of, for example,
> "Lang detector said Thai, how likely is it to actually be Thai?"
--
This message was sent by Atlassian Jira
(v8.20.10#820010)