Re: [Pharo-users] Ridiculous we are

Alain Rastoul Wed, 24 Sep 2014 14:00:48 -0700


Le 24/09/2014 19:09, Benjamin Pollack a écrit :

If Pharo used > ByteArrays to represent paths, with convenience methods for 
working with
UTF-8 (since I do agree that's the most likely thing for a user/dev to
want), then you'd be able to work with all files no matter what, *and*
have a convenient way of doing so for the common case.

Hi Ben,

I strongly disagree with you on this point: using byte arrays (or bytestrings) is a pain in an international context.

The OS knows about its encoding: locale for unix, code page for windows.

Windows code pages depends on country, for english windows 1252 (similarto iso-8859-1), for other european countries, other variations of8859-xx... (welcome to ISO soup), same for unix.

Java uses UTF8 strings and dotNet uses UTF16 strings (don't know forPython) where chars are not bytes and they are not used as byte arraysbut as Character arrays.Both do conversions from OS character set encoding to internal encodingfor strings (paths and whatever).


There is already an UTF8 and UTF16 encoding support in Pharo, but the
standard String class uses bytes, and lot of files, directories and
system methods use ByteString class and that is the problem here.

UTF8 encoding in Pharo encodes to a variable lenght ByteString, which isnot the same as an (hypothetical) Utf8String where all (variable length)chars would be utf8 encoded.

Using a new UTF8 or UTF16 string class could be a major rework,
but taking a decision about about internal string encoding is needed.
As Sven says, there is no emergency and you have a workaround, but
perhaps using the existing WideString encoded as UTF16 (or UTF32?) in
some well defined classes/methods could be a good start for this rework?

IMHO the workaround of using utf8 encoded byte strings is not a good wayto deal with this problem and should not be granted as "the solution".

Re: [Pharo-users] Ridiculous we are

Reply via email to