Le 25/09/2014 00:06, Sven Van Caekenberghe a écrit :
Alain,

The character encoding situation in Pharo is pretty good actually. The only 
problem is that there is some old school code left that encodes strings into 
strings, but today you can easily write much better and conceptually correct 
code.

You could have a look at this draft chapter of the upcoming 'Enterprise Pharo' 
book that I am currently writing:

   http://stfx.eu/EnterprisePharo/Zinc-Encoding-Meta/

Concerning file system paths, FilePathEncoder and FilePluginPrimitives already 
do the right thing.

Now, your idea about using UTF-8 to represent internal Strings is something 
that has been discussed before and in many other languages as well. The short 
answer is that due to it being variable length, the inefficiency is (probably) 
just too high. Simple indexed access becomes a problem, let alone more complex 
string manipulations. I am not saying that it cannot be done, I think it is 
just not worth the trouble. The current solution in Pharo with ByteString and 
WideString is quite nice (check the chapter I mentioned before).

Sven

Very interesting !
It seems that most of what I was saying is already here :)
I was not saying that Pharo should use utf8 (I mentionned utf8 because it is a standard, but I find the variable length encoding very weird), I was rather talking of using WideString in UTF 16 or 32 and that's done. I saw asWideString but didn't know about automatic convertion or codepoint selector and internal wide string support. Does it means that Pharo Greek users (for example) use WideString for Strings without having to specify it or make explicit convertions (except of course when dealing with bytes if they want to) ?
If yes, very good, job is almost done :)
(personnally I would also deprecate ByteString, and get rid of it, just my opinion).
Thanks for the link, another good chapter .

Regards,

Alain



Reply via email to