Hi,

I have not looked into latest Tika language detectors, but much has changed 
since 1.x.
Their 
https://tika.apache.org/3.2.3/api/org/apache/tika/language/detect/LanguageDetector.html
 base class now have several known implementations:
Lingo24LangDetector, OpenNLPDetector, OptimaizeLangDetector, TextLangDetector, 
TikaLanguageDetector

So another direction we could take is to base our 'langid' module entirely on 
Tika's framework instead of our home-grown... It's not really our job to 
maintain a language detector :)

But perhaps that is food for Solr11, and that deprecating Tika1.x detector in 
our langid module is sound in any case. Perhaps it comes back in 10.x if you 
bring in TikaPipes 3.x. Don't know.

Jan

> 16. okt. 2025 kl. 02:10 skrev David Eric Pugh <[email protected]>:
> 
> I had successfully upgrade it to Tika 3 in a PR...   However, the point you 
> made about it being "the oldest of three ways" sways me to the "lets remove 
> it" side of the discussion.
>    On Tuesday, October 14, 2025 at 07:57:41 PM EDT, Jan Høydahl 
> <[email protected]> wrote:  
> 
> Hi,
> 
> I propose deprecating TikaLanguageIdentifierUpdateProcessor in 9.10 and 
> remove in 10.0.
> See https://issues.apache.org/jira/browse/SOLR-17958 for details.
> 
> If no objections at the end of Thursday I'll proceed with this.
> 
> Jan
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 

Reply via email to