Hi, I have not looked into latest Tika language detectors, but much has changed since 1.x. Their https://tika.apache.org/3.2.3/api/org/apache/tika/language/detect/LanguageDetector.html base class now have several known implementations: Lingo24LangDetector, OpenNLPDetector, OptimaizeLangDetector, TextLangDetector, TikaLanguageDetector
So another direction we could take is to base our 'langid' module entirely on Tika's framework instead of our home-grown... It's not really our job to maintain a language detector :) But perhaps that is food for Solr11, and that deprecating Tika1.x detector in our langid module is sound in any case. Perhaps it comes back in 10.x if you bring in TikaPipes 3.x. Don't know. Jan > 16. okt. 2025 kl. 02:10 skrev David Eric Pugh <[email protected]>: > > I had successfully upgrade it to Tika 3 in a PR... However, the point you > made about it being "the oldest of three ways" sways me to the "lets remove > it" side of the discussion. > On Tuesday, October 14, 2025 at 07:57:41 PM EDT, Jan Høydahl > <[email protected]> wrote: > > Hi, > > I propose deprecating TikaLanguageIdentifierUpdateProcessor in 9.10 and > remove in 10.0. > See https://issues.apache.org/jira/browse/SOLR-17958 for details. > > If no objections at the end of Thursday I'll proceed with this. > > Jan > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] >
