Le 24/09/2014 19:09, Benjamin Pollack a écrit :

If Pharo used > ByteArrays to represent paths, with convenience methods for 
working with
UTF-8 (since I do agree that's the most likely thing for a user/dev to
want), then you'd be able to work with all files no matter what, *and*
have a convenient way of doing so for the common case.
Hi Ben,
I strongly disagree with you on this point: using byte arrays (or byte strings) is a pain in an international context.
The OS knows about its encoding: locale for unix, code page for windows.
Windows code pages depends on country, for english windows 1252 (similar to iso-8859-1), for other european countries, other variations of 8859-xx... (welcome to ISO soup), same for unix.

Java uses UTF8 strings and dotNet uses UTF16 strings (don't know for Python) where chars are not bytes and they are not used as byte arrays but as Character arrays. Both do conversions from OS character set encoding to internal encoding for strings (paths and whatever).

There is already an UTF8 and UTF16 encoding support in Pharo, but the
standard String class uses bytes, and lot of files, directories and
system methods use ByteString class and that is the problem here.
UTF8 encoding in Pharo encodes to a variable lenght ByteString, which is not the same as an (hypothetical) Utf8String where all (variable length) chars would be utf8 encoded.
Using a new UTF8 or UTF16 string class could be a major rework,
but taking a decision about about internal string encoding is needed.
As Sven says, there is no emergency and you have a workaround, but
perhaps using the existing WideString encoded as UTF16 (or UTF32?) in
some well defined classes/methods could be a good start for this rework?
IMHO the workaround of using utf8 encoded byte strings is not a good way to deal with this problem and should not be granted as "the solution".


Reply via email to