I would agree with this. In general the OS today store file data ( ie the file system data not the data in the file) in Unicode ( be it utf-16 or utf-8). On Linux this is not always the case it could be big5 or some other locale encoding. On Linux there are means to see what the “native” encoding is to use it.
I should note that the idea of converting binary to Unicode does not really exist. The point of a binary string to is to hold random data ( ie like a double in the raw form 64-bit vs the dec values of 1.2385). One can assume that it is a certain code page encoding and convert from that. And like I stated above there are api to see what the locale code page encoding is and that can be used to convert the code to the local ANSI/OEM encoding. This is different from a binary string. Jason From: Scons-dev [mailto:[email protected]] On Behalf Of Gary Oberbrunner Sent: Wednesday, May 27, 2015 7:43 AM To: SCons developer list Subject: Re: [Scons-dev] Merge PR #235 before release On Wed, May 27, 2015 at 6:52 AM, anatoly techtonik <[email protected]<mailto:[email protected]>> wrote: What I need is a bulletproof way to convert from anything to unicode. This requires some kind of escaping to go forward and back. Some helper methods like u2b() (unicode to binary) and b2u(). I am quite surprised that so far I found nothing for this "simple" case. That's because in general the encoding of the "binary" string is unknown. Is it ascii, utf-8, Windows CP-1252, shift-JIS, or something else? You can't decode such a string to Unicode without knowing the encoding. Check out the python-3 branch where we've been working through some of those issues. Your u2b is "easy" if you assume you want the binary to be utf-8 encoded, which is normally safe; this conversion is guaranteed to work. Your b2u is not so easy. You can't just assume utf-8 as you might think; if the string has invalid utf-8 bytes it'll raise an error or generate dummy chars depending on the args you pass to str.decode(). At least it'll get mangled if it's in a different encoding than you expect. -- Gary
_______________________________________________ Scons-dev mailing list [email protected] https://pairlist2.pair.net/mailman/listinfo/scons-dev
