I would agree with this.

In general the OS today store file data ( ie the file system data not the data 
in the file) in Unicode ( be it utf-16 or utf-8). On Linux this is not always 
the case it could be big5 or some other locale encoding.  On Linux there are 
means to see what the “native” encoding is to use it.

I should note that the idea of converting binary to Unicode does not really 
exist. The point of a binary string to is to hold random data ( ie like a 
double in the raw form 64-bit vs the dec values of 1.2385). One can assume that 
it is a certain code page encoding and convert from that. And like I stated 
above there are api to see what the locale code page encoding is and that can 
be used to convert the code to the local ANSI/OEM encoding. This is different 
from a binary string.

Jason



From: Scons-dev [mailto:[email protected]] On Behalf Of Gary 
Oberbrunner
Sent: Wednesday, May 27, 2015 7:43 AM
To: SCons developer list
Subject: Re: [Scons-dev] Merge PR #235 before release


On Wed, May 27, 2015 at 6:52 AM, anatoly techtonik 
<[email protected]<mailto:[email protected]>> wrote:
What I need is a bulletproof way to convert from anything to unicode. This
requires some kind of escaping to go forward and back. Some helper
methods like u2b() (unicode to binary) and b2u(). I am quite surprised that
so far I found nothing for this "simple" case.

That's because in general the encoding of the "binary" string is unknown.  Is 
it ascii, utf-8, Windows CP-1252, shift-JIS, or something else?  You can't 
decode such a string to Unicode without knowing the encoding.  Check out the 
python-3 branch where we've been working through some of those issues.  Your 
u2b is "easy" if you assume you want the binary to be utf-8 encoded, which is 
normally safe; this conversion is guaranteed to work.  Your b2u is not so easy. 
 You can't just assume utf-8 as you might think; if the string has invalid 
utf-8 bytes it'll raise an error or generate dummy chars depending on the args 
you pass to str.decode().  At least it'll get mangled if it's in a different 
encoding than you expect.

--
Gary
_______________________________________________
Scons-dev mailing list
[email protected]
https://pairlist2.pair.net/mailman/listinfo/scons-dev

Reply via email to