It seems macOS normalizes UTF-8 differently from everyone else in file names (I think base character + composing instead of precomposed codepoint). That might affect PWD. For environment variables, even if most sensible platforms should have adopted UTF-8 by now, I wouldn't be surprised if there's no official encoding whatsoever (i.e. they're just bytes with a 0 at the end…)
On 17 April 2018 at 09:36, Sven Van Caekenberghe <s...@stfx.eu> wrote: > Hi, > > The dictionary > > OSPlatform current environment > > contains a copy of the OS's environment variables (more correctly of the > VM process), as key/value pairs. > > These are obtained via the following system calls: > > on macOS & *nix > > LIBC environ > > on Windows > > KERNEL32 GetEnvironmentStrings > > It is however a bit unclear how these are encoded. On macOS & *nix that > seems to be UTF8, on Windows there are some reports that it appears to be > Latin1 - but both might be locale specific, I don't know either way. > > Does anyone know for sure ? > > I furthermore think that OSEnvironment and its subclasses, who do this > call, should be responsible for decoding the C strings into proper Pharo > strings, and not leave that responsibility to its users. > > Fundamentally, in the following, the decoding is still not done correctly > and that is wrong/confusing IMHO. > > $ FOO=benoît ./pharo Pharo.image eval 'OSEnvironment current associations' > {'TERM_PROGRAM'->'Apple_Terminal'. 'TERM'->'xterm-256color'. > 'SHELL'->'/bin/bash'. 'TMPDIR'->'/var/folders/sy/ > sndrtj9j1tq06j0lfnshmrl80000gn/T/'. 'FOO'->'benoît'. > 'Apple_PubSub_Socket_Render'->'/private/tmp/com.apple.launchd.uWk7pivcLT/Render'. > 'TERM_PROGRAM_VERSION'->'404'. > 'TERM_SESSION_ID'->'845BECCD-0AB0-4686-B7F9-3A0FF84BDCB7'. > 'USER'->'sven'. > 'SSH_AUTH_SOCK'->'/private/tmp/com.apple.launchd.y5oCwdUyaG/Listeners'. > 'PATH'->'/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/texbin:/opt/X11/bin'. > 'PWD'->'/tmp/benoiÌ‚t'. 'XPC_FLAGS'->'0x0'. 'XPC_SERVICE_NAME'->'0'. > 'HOME'->'/Users/sven'. 'SHLVL'->'2'. 'LOGNAME'->'sven'. > 'LC_CTYPE'->'UTF-8'. 'DISPLAY'->'/private/tmp/com. > apple.launchd.lsgASYFiWW/org.macosforge.xquartz:0'. > 'SECURITYSESSIONID'->'186a9'. 'OLDPWD'->'/tmp/benoiÌ‚t'. > '_'->'/tmp/benoiÌ‚t/pharo-vm/Pharo.app/Contents/MacOS/Pharo'. > '__CF_USER_TEXT_ENCODING'->'0x1F5:0x0:0x0'} > > Of course, if we change this, we will need to fix callers. > > Opinions ? > > Sven > > PS: Furthermore, I note that there is a subtle difference in how $FOO and > $PWD in the above are UTF-8 encoded. In the former, normalisation was done, > in the latter not. Maybe that could lead to problems (when > comparing/composing them). This is a difficult/complex subject ( > https://medium.com/concerning-pharo/an-implementation-of-unicode- > normalization-7c6719068f43). > > > > -- Damien Pollet type less, do more [ | ] http://people.untyped.org/damien.pollet