Langdetect

Christian Herrmann Mon, 27 Mar 2017 00:32:17 -0700

Hi all,

I am experimenting with the langdetect engine and the rate of correct
detects is pretty bad. For example, the little sentence "Today is a good
day" as


   - plaintext called via webinterface (
   http://localhost:8080/enhancer/chain/default) leads to "so" = Somalia
   - If I put the text into a *.docx and call it via Rest (curl -X POST -H
   "Accept: application/json" -H "Content-type: text/plain" -T test3.docx
   http://localhost:8080/enhancer/chain/default;) the output is "bn" =
   Bengali

I am using the default chain of the current trunk. What am I doing wrong? I
tried a few texts (also longer ones like wikipedia articles), and pretty
often (80%), the detection is wrong and also different between word and
plain text? If I call the tinka engine itself, the extracted text is the
same as the plain text ... Any hints?

Langdetect

Reply via email to