Hi all, I am experimenting with the langdetect engine and the rate of correct detects is pretty bad. For example, the little sentence "Today is a good day" as
- plaintext called via webinterface ( http://localhost:8080/enhancer/chain/default) leads to "so" = Somalia - If I put the text into a *.docx and call it via Rest (curl -X POST -H "Accept: application/json" -H "Content-type: text/plain" -T test3.docx http://localhost:8080/enhancer/chain/default;) the output is "bn" = Bengali I am using the default chain of the current trunk. What am I doing wrong? I tried a few texts (also longer ones like wikipedia articles), and pretty often (80%), the detection is wrong and also different between word and plain text? If I call the tinka engine itself, the extracted text is the same as the plain text ... Any hints?