On 2018-05-29 21:13:43 +1000, Chris Angelico wrote: > You can always solve a subset of problems. Using your own knowledge of > German, you are able to better solve problems involving German text. > But that doesn't make you any better than chardet at validating > Chinese text, or Korean text, or Klingon text, or any other language > you don't know.
But I don't have to. Chardet has to be reasonably good at identifying any encoding. I only have to be good at identifying the encoding of files which I need to import (or otherwise process.). Please go back to the original posting. The poster has one file which he wants to read, and asked how to determine the encoding. He was told categorically that this is impossible and he must ask the source. THIS is what I'm responding to, not the problem of finding a generic solution which works for every possible file. The OP has one file. He wants to read it. The very fact that he wants to read this particular file makes it very likely that he knows something about the contents of the file. So he has domain knowledge. Which makes it very likely that he can distinguish a correct from an incorrect decoding. He probably can't distinguish Korean poetry from a Vietnamese shopping list, but his file probably isn't either. hp -- _ | Peter J. Holzer | we build much bigger, better disasters now |_|_) | | because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
signature.asc
Description: PGP signature
-- https://mail.python.org/mailman/listinfo/python-list