[ 
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856039#comment-16856039
 ] 

Tim Allison commented on TIKA-2790:
-----------------------------------

I was able to get 4x improvement in speed, which is still slower than Optimaize 
and, far, far slower than Yalder.  IIUC, both Optimaize and Yalder do not 
process the full string.  Rather, they sample or have some kind of stopping 
criterion.  I think we can work towards that in our own wrapper of OpenNLP, 
and, hopefully, we can push that upstream back into OpenNLP.

> Consider switching lang-detection in tika-eval to open-nlp
> ----------------------------------------------------------
>
>                 Key: TIKA-2790
>                 URL: https://issues.apache.org/jira/browse/TIKA-2790
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Major
>         Attachments: fra_mixed_100000_0.0_0.txt, langid_20190509.zip, 
> langid_20190510.zip, langid_20190514.zip, langid_20190514_plus_minus_1.zip
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to