[ 
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839418#comment-16839418
 ] 

Tim Allison commented on TIKA-2790:
-----------------------------------

I'm going to kick off a run where I add noise by adding/subtracting one to a 
given codepoint so that the noise won't be chose between 0 and 1,000,000, but 
should be within the unicode block of the actual language.  Results tomorrow.

If anyone has recommendations to improve the methodology or reporting...or if 
I've made errors in wrapping any of these lang detectors and/or mapping 
languages, please let me know.

> Consider switching lang-detection in tika-eval to open-nlp
> ----------------------------------------------------------
>
>                 Key: TIKA-2790
>                 URL: https://issues.apache.org/jira/browse/TIKA-2790
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Major
>         Attachments: langid_20190509.zip, langid_20190510.zip, 
> langid_20190514.zip
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to