[ 
https://issues.apache.org/jira/browse/TIKA-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17603483#comment-17603483
 ] 

Nick Burch commented on TIKA-3850:
----------------------------------

The kind of statistical language model used in Tika struggles with very short 
text. What happens if you feed a longer block of spanish language text in?

> Spanish text is incorrectly detected as Galician
> ------------------------------------------------
>
>                 Key: TIKA-3850
>                 URL: https://issues.apache.org/jira/browse/TIKA-3850
>             Project: Tika
>          Issue Type: Bug
>          Components: languageidentifier
>    Affects Versions: 2.4.1
>         Environment: org.apache.tika:tika-langdetect-optimaize:2.4.1
> org.apache.tika:tika-core:2.4.1
>            Reporter: Lenne Hendrickx
>            Priority: Minor
>
> The following Spanish text is incorrectly detected as Galician.
> {noformat}
> Hola! Donde puedo contactar para una garantía?{noformat}
> The es and gl models are loaded into the language detector.
> Language result:
> {noformat}
> language: gl
> score: 0.999995{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to