On Tue, May 29, 2018 at 6:34 PM, Peter J. Holzer <hjp-pyt...@hjp.at> wrote: > On 2018-05-23 06:03:38 +0000, Steven D'Aprano wrote: >> Mojibake is especially difficult to deal with when you are dealing with >> short text snippets like file names or user names which can contain >> arbitrary characters, where there is rarely any way to recognise the >> "correct" string. > > For single file names or user names, sure. But if you have a list of > them, there is still a high probability that many of them will contain > recognizable words which can be used to deduce the (or a) correct > encoding. (Unless it's from the Ministry of Silly Names).
Ohh... are you assuming that, in a list of file names, all of them use the same encoding? Ah, yes, well, that WOULD make it easier, wouldn't it. Sadly, not the case. ChrisA -- https://mail.python.org/mailman/listinfo/python-list