Damien Pollet wrote
> It seems macOS normalizes UTF-8 differently from everyone else in file
> names (I think base character + composing instead of precomposed
> codepoint). That might affect PWD.
> For environment variables, even if most sensible platforms should have
> adopted UTF-8 by now, I wouldn't be surprised if there's no official
> encoding whatsoever (i.e. they're just bytes with a 0 at the end…)
> 
> On 17 April 2018 at 09:36, Sven Van Caekenberghe <

> sven@

> > wrote:
> 
>> Hi,
>>
>> The dictionary
>>
>>  OSPlatform current environment
>>
>> contains a copy of the OS's environment variables (more correctly of the
>> VM process), as key/value pairs.
>>
>> These are obtained via the following system calls:
>>
>> on macOS & *nix
>>
>>   LIBC environ
>>
>> on Windows
>>
>>   KERNEL32 GetEnvironmentStrings
>>
>> It is however a bit unclear how these are encoded. On macOS & *nix that
>> seems to be UTF8, on Windows there are some reports that it appears to be
>> Latin1 - but both might be locale specific, I don't know either way.
>>
>> Does anyone know for sure ?
>>
>> I furthermore think that OSEnvironment and its subclasses, who do this
>> call, should be responsible for decoding the C strings into proper Pharo
>> strings, and not leave that responsibility to its users.
>>
>> Fundamentally, in the following, the decoding is still not done correctly
>> and that is wrong/confusing IMHO.
>>
>> $ FOO=benoît ./pharo Pharo.image eval 'OSEnvironment current
>> associations'
>> {'TERM_PROGRAM'->'Apple_Terminal'. 'TERM'->'xterm-256color'.
>> 'SHELL'->'/bin/bash'. 'TMPDIR'->'/var/folders/sy/
>> sndrtj9j1tq06j0lfnshmrl80000gn/T/'. 'FOO'->'benoît'.
>> 'Apple_PubSub_Socket_Render'->'/private/tmp/com.apple.launchd.uWk7pivcLT/Render'.
>> 'TERM_PROGRAM_VERSION'->'404'.
>> 'TERM_SESSION_ID'->'845BECCD-0AB0-4686-B7F9-3A0FF84BDCB7'.
>> 'USER'->'sven'.
>> 'SSH_AUTH_SOCK'->'/private/tmp/com.apple.launchd.y5oCwdUyaG/Listeners'.
>> 'PATH'->'/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/texbin:/opt/X11/bin'.
>> 'PWD'->'/tmp/benoît'. 'XPC_FLAGS'->'0x0'. 'XPC_SERVICE_NAME'->'0'.
>> 'HOME'->'/Users/sven'. 'SHLVL'->'2'. 'LOGNAME'->'sven'.
>> 'LC_CTYPE'->'UTF-8'. 'DISPLAY'->'/private/tmp/com.
>> apple.launchd.lsgASYFiWW/org.macosforge.xquartz:0'.
>> 'SECURITYSESSIONID'->'186a9'. 'OLDPWD'->'/tmp/benoît'.
>> '_'->'/tmp/benoît/pharo-vm/Pharo.app/Contents/MacOS/Pharo'.
>> '__CF_USER_TEXT_ENCODING'->'0x1F5:0x0:0x0'}
>>
>> Of course, if we change this, we will need to fix callers.
>>
>> Opinions ?
>>
>> Sven
>>
>> PS: Furthermore, I note that there is a subtle difference in how $FOO and
>> $PWD in the above are UTF-8 encoded. In the former, normalisation was
>> done,
>> in the latter not. Maybe that could lead to problems (when
>> comparing/composing them). This is a difficult/complex subject (
>> https://medium.com/concerning-pharo/an-implementation-of-unicode-
>> normalization-7c6719068f43).
>>
>>
>>
>>
> 
> 
> -- 
> Damien Pollet
> type less, do more [ | ] http://people.untyped.org/damien.pollet

If by different, you mean that it actually normalizes the file names, then
yes.
All Mac filenames are in a well defined form; NFD.
On linux, they're just arrays of bytes, and anything goes.
That the bytes mostly happen to be valid utf8 strings in NFC, is just a
by-product of the fact that's the format most programs use when calling the
file primitives. 

Cheers,
Henry



--
Sent from: http://forum.world.st/Pharo-Smalltalk-Developers-f1294837.html

Reply via email to