[ https://issues.apache.org/jira/browse/TIKA-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann reassigned TIKA-2520: --------------------------------------- Assignee: Chris A. Mattmann > OptimaizeLangDetector#loadModels() should not be called for every single > langdetect HTTP request > ------------------------------------------------------------------------------------------------ > > Key: TIKA-2520 > URL: https://issues.apache.org/jira/browse/TIKA-2520 > Project: Tika > Issue Type: Improvement > Components: server > Affects Versions: 1.16 > Reporter: Vincent van Donselaar > Assignee: Chris A. Mattmann > Priority: Minor > Labels: performance > Original Estimate: 2h > Remaining Estimate: 2h > > Tika REST server's `/language` resource invokes the relatively heavy > `loadModels` operation for every language detect call: > {code:title=LanguageResource.java} > public String detect(final String string) throws IOException { > LanguageResult language = new > OptimaizeLangDetector().loadModels().detect(string); > String detectedLang = language.getLanguage(); > LOG.info("Detecting language for incoming resource: [{}]", > detectedLang); > return detectedLang; > } > {code} > This could be optimized by (lazy?) loading the models only once and keep them > in memory. I assume the `LanguageDetector` is not thread safe, so I expect > this requires an ExecutorService with language detectors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)