[ 
https://issues.apache.org/jira/browse/TIKA-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489789#comment-16489789
 ] 

ASF GitHub Bot commented on TIKA-2520:
--------------------------------------

chrismattmann commented on issue #237: TIKA-2520 optimize OptimaizeLangDetector 
default loadModel()
URL: https://github.com/apache/tika/pull/237#issuecomment-391857675
 
 
   thanks @kkrugler this looks good then, so committed!
   I will push to master/2x shortly.
   ```
   nonas:tika2.0.0 mattmann$ git push -u origin branch_1x
   Counting objects: 14, done.
   Delta compression using up to 4 threads.
   Compressing objects: 100% (10/10), done.
   Writing objects: 100% (14/14), 1.72 KiB | 252.00 KiB/s, done.
   Total 14 (delta 4), reused 0 (delta 0)
   remote: Resolving deltas: 100% (4/4), completed with 4 local objects.
   To github.com:/apache/tika.git
      cdca0f726..7e3e34caf  branch_1x -> branch_1x
   Branch 'branch_1x' set up to track remote branch 'branch_1x' from 'origin'.
   nonas:tika2.0.0 mattmann$ 
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> OptimaizeLangDetector#loadModels() should not be called for every single 
> langdetect HTTP request
> ------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-2520
>                 URL: https://issues.apache.org/jira/browse/TIKA-2520
>             Project: Tika
>          Issue Type: Improvement
>          Components: server
>    Affects Versions: 1.16
>            Reporter: Vincent van Donselaar
>            Assignee: Chris A. Mattmann
>            Priority: Minor
>              Labels: performance
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Tika REST server's `/language` resource invokes the relatively heavy 
> `loadModels` operation for every language detect call:
> {code:title=LanguageResource.java}
> public String detect(final String string) throws IOException {
>       LanguageResult language = new 
> OptimaizeLangDetector().loadModels().detect(string);
>       String detectedLang = language.getLanguage();
>       LOG.info("Detecting language for incoming resource: [{}]", 
> detectedLang);
>       return detectedLang;
> }
> {code}
> This could be optimized by (lazy?) loading the models only once and keep them 
> in memory. I assume the `LanguageDetector` is not thread safe, so I expect 
> this requires an ExecutorService with language detectors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to