Re: [Pharo-users] Ridiculous we are
On 25 Sep 2014, at 8:55 , Alain Rastoul alf.mmm@gmail.com wrote: Le 25/09/2014 07:23, Sven Van Caekenberghe a écrit : On 25 Sep 2014, at 01:04, Alain Rastoul alf.mmm@gmail.com wrote: Le 25/09/2014 00:06, Sven Van Caekenberghe a écrit : Alain, The character encoding situation in Pharo is pretty good actually. The only problem is that there is some old school code left that encodes strings into strings, but today you can easily write much better and conceptually correct code. You could have a look at this draft chapter of the upcoming 'Enterprise Pharo' book that I am currently writing: http://stfx.eu/EnterprisePharo/Zinc-Encoding-Meta/ Concerning file system paths, FilePathEncoder and FilePluginPrimitives already do the right thing. Now, your idea about using UTF-8 to represent internal Strings is something that has been discussed before and in many other languages as well. The short answer is that due to it being variable length, the inefficiency is (probably) just too high. Simple indexed access becomes a problem, let alone more complex string manipulations. I am not saying that it cannot be done, I think it is just not worth the trouble. The current solution in Pharo with ByteString and WideString is quite nice (check the chapter I mentioned before). Sven Very interesting ! It seems that most of what I was saying is already here :) I was not saying that Pharo should use utf8 (I mentionned utf8 because it is a standard, but I find the variable length encoding very weird), I was rather talking of using WideString in UTF 16 or 32 and that's done. I saw asWideString but didn't know about automatic convertion or codepoint selector and internal wide string support. Does it means that Pharo Greek users (for example) use WideString for Strings without having to specify it or make explicit convertions (except of course when dealing with bytes if they want to) ? If yes, very good, job is almost done :) (personnally I would also deprecate ByteString, and get rid of it, just my opinion). Thanks for the link, another good chapter . Regards, Alain ByteString is important because it is an optimalization of the most common case. I understand the point here, memory/data footprint, cpu cache and so on (not talking of encoding/decoding). I think that's why Microsoft choosed UTF16 (old UCS2) as a middle solution because it covers most of character sets with 2 bytes. It used to be a middle solution, back when UCS2 could encode the entire defined Unicode set. Novadays it's just the worst of both worlds; you waste memory for most normal text, *and* you don't have constant time indexed code point access. The duality we have in Pharo is an attempt to achieve the *best* of both worlds, wasting little memory for the normal case (latin1), and maintain constant time indexed access in all cases. The ultimate solution for this approach would have a trio of string classes with slot sizes 8 - 16 - 32 expanding / contracting as needed, but we don't have classes with variable short slots. (currently, they're planned in new Cog, if I've understood Eliots new object format correctly) Cheers, Henry signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [Pharo-users] Ridiculous we are
I'm not expert and I would like to know what people think. But I think that we should consider - the impact of spur new object format. I would like to have unicode and clean the leadChar Stef
Re: [Pharo-users] Ridiculous we are
Sven I love this chapter. I will read it calmly now. Stef On 25/9/14 07:23, Sven Van Caekenberghe wrote: On 25 Sep 2014, at 01:04, Alain Rastoul alf.mmm@gmail.com wrote: Le 25/09/2014 00:06, Sven Van Caekenberghe a écrit : Alain, The character encoding situation in Pharo is pretty good actually. The only problem is that there is some old school code left that encodes strings into strings, but today you can easily write much better and conceptually correct code. You could have a look at this draft chapter of the upcoming 'Enterprise Pharo' book that I am currently writing: http://stfx.eu/EnterprisePharo/Zinc-Encoding-Meta/ Concerning file system paths, FilePathEncoder and FilePluginPrimitives already do the right thing. Now, your idea about using UTF-8 to represent internal Strings is something that has been discussed before and in many other languages as well. The short answer is that due to it being variable length, the inefficiency is (probably) just too high. Simple indexed access becomes a problem, let alone more complex string manipulations. I am not saying that it cannot be done, I think it is just not worth the trouble. The current solution in Pharo with ByteString and WideString is quite nice (check the chapter I mentioned before). Sven Very interesting ! It seems that most of what I was saying is already here :) I was not saying that Pharo should use utf8 (I mentionned utf8 because it is a standard, but I find the variable length encoding very weird), I was rather talking of using WideString in UTF 16 or 32 and that's done. I saw asWideString but didn't know about automatic convertion or codepoint selector and internal wide string support. Does it means that Pharo Greek users (for example) use WideString for Strings without having to specify it or make explicit convertions (except of course when dealing with bytes if they want to) ? If yes, very good, job is almost done :) (personnally I would also deprecate ByteString, and get rid of it, just my opinion). Thanks for the link, another good chapter . Regards, Alain Yes, the Greek users won't notice a difference, it is all transparent. ByteString is important because it is an optimalization of the most common case. As a normal user you should only think of abstract Strings and never use #asByteString (but use proper encoding). Feedback on the chapter is always welcome. Sven
Re: [Pharo-users] Ridiculous we are
Le 26/09/2014 21:00, p...@highoctane.be a écrit : I'd love another title for this thread. It depresses me. Yes, me too. Hilaire -- Dr. Geo - http://drgeo.eu iStoa - http://istoa.drgeo.eu
Re: [Pharo-users] Ridiculous we are
On 25 Sep 2014, at 5:00 , Hilaire Fernandes hila...@drgeo.eu wrote: Le 24/09/2014 18:48, Benjamin Pollack a écrit : On Tue, 23 Sep 2014 08:51:54 -0400, Hilaire hila...@drgeo.eu wrote: Le 23/09/2014 14:09, Damien Cassou a écrit : I recently read documents about utf-8 encoding. In all of them, the author says that pathnames should be kept as is because you never know which encoding the filesystem uses. So, a filename should probably be a bytearray. yes, but a #é should be encoded in two bytes. As noted in my previous message, é could be represented as either one or two Unicode code points, and these in turn could validly be either two or three bytes in UTF-8. My gut says that $é should be U+00E9, because otherwise you should have to use two Characters ($e and $´), but you could legitimately argue otherwise as well, and at any rate, #é could definitely be either. This is likely the core of the issue you're hitting. As I understand it, #é should be encoded on two bytes and only two byte. Only ASCII is coded as 1 byte with UTF-8. See ref. on Wikipedia Hilaire: Benjamin is talking about which unicode normalization form é should be represented in, which is orthogonal to the encoding; http://en.wikipedia.org/wiki/Unicode_equivalence#Combining_and_precomposed_characters . So é can indeed be encoded in two different ways in utf8 (as in any other encoding), both as #[c3 a9] (encoding U+E9, Latin small letter e with acute), and as #[65 cc 81] (encoding U+65, Latin small letter e, followed by U+0301, Combining accute accent) Benjamin: Since the base path that contains the problematic character originates from a filesystem primitive, we can safely assume it's already in a canonical form*, Pharo does no automatic normalization. (that is, if the path would have been e + ´, the internal string would have two separate characters as well) Cheers, Henry * Only Mac OSX defines a canonical form for its paths anyways, the others don't care signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [Pharo-users] Ridiculous we are
On 22 Sep 2014, at 10:07 , Hilaire hila...@drgeo.eu wrote: However font path seems ok: File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources. Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it should be utf-8, right? I think there are issue on Windows, as some user reported to me. The fun thing about plugins calling external libraries, is that you have to find out what that library does to know the right answer to what encoding char* parameters are meant to be passed... In the case of FreeType, after some digging*, it seems to me it ends up calling fopen on all platforms, which on windows... *drumroll* ... resolves to the legacy ANSI version** of the Windows file libraries. Hence, the correct encoding to use on Windows would be the locale legacy code page. It also means that, on Windows, you *cannot* load fonts from a directory whose name is not encodable in the current codepage no matter what we do in Pharo. (short of submitting a bug-fix to the FreeType project) Cheers, Henry *FT_New_Face (http://git.savannah.gnu.org/cgit/freetype/freetype2.git/tree/src/base/ftobjs.c) calls... FT_Open_Face (same) which calls... FT_Stream_New (same) which calls... FT_Stream_Open (http://git.savannah.gnu.org/cgit/freetype/freetype2.git/tree/src/base/ftsystem.c) which calls... ft_fopen (http://git.savannah.gnu.org/cgit/freetype/freetype2.git/tree/include/config/ftstdlib.h) which resolves to f_open. ** http://msdn.microsoft.com/en-us/library/yeby3zcb.aspx , don't be fooled, the Unicode support section is about contents written/read to/from file, not the path parameter. signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [Pharo-users] Ridiculous we are
Le 25/09/2014 07:23, Sven Van Caekenberghe a écrit : On 25 Sep 2014, at 01:04, Alain Rastoul alf.mmm@gmail.com wrote: Le 25/09/2014 00:06, Sven Van Caekenberghe a écrit : Alain, The character encoding situation in Pharo is pretty good actually. The only problem is that there is some old school code left that encodes strings into strings, but today you can easily write much better and conceptually correct code. You could have a look at this draft chapter of the upcoming 'Enterprise Pharo' book that I am currently writing: http://stfx.eu/EnterprisePharo/Zinc-Encoding-Meta/ Concerning file system paths, FilePathEncoder and FilePluginPrimitives already do the right thing. Now, your idea about using UTF-8 to represent internal Strings is something that has been discussed before and in many other languages as well. The short answer is that due to it being variable length, the inefficiency is (probably) just too high. Simple indexed access becomes a problem, let alone more complex string manipulations. I am not saying that it cannot be done, I think it is just not worth the trouble. The current solution in Pharo with ByteString and WideString is quite nice (check the chapter I mentioned before). Sven Very interesting ! It seems that most of what I was saying is already here :) I was not saying that Pharo should use utf8 (I mentionned utf8 because it is a standard, but I find the variable length encoding very weird), I was rather talking of using WideString in UTF 16 or 32 and that's done. I saw asWideString but didn't know about automatic convertion or codepoint selector and internal wide string support. Does it means that Pharo Greek users (for example) use WideString for Strings without having to specify it or make explicit convertions (except of course when dealing with bytes if they want to) ? If yes, very good, job is almost done :) (personnally I would also deprecate ByteString, and get rid of it, just my opinion). Thanks for the link, another good chapter . Regards, Alain ByteString is important because it is an optimalization of the most common case. I understand the point here, memory/data footprint, cpu cache and so on (not talking of encoding/decoding). I think that's why Microsoft choosed UTF16 (old UCS2) as a middle solution because it covers most of character sets with 2 bytes. May be I'm excessive but I have reasons, once had to debug a french program used in China by a Chinese user who was seeing weird characters on a (weird-to-me) chinese windows xp ... a missing WideString and a great moment of loneliness :) As a normal user you should only think of abstract Strings and never use #asByteString (but use proper encoding). Feedback on the chapter is always welcome. Sven Agree. Your chapter is excellent, I played a bit with Zn encoders. I look forward to Pharo for the enterprise on Lulu. However, I'm wondering , WideString beeing a variableWordSubclass: with 32 bits words on a 32 bits vm, what will it become on a 64 bits vm ? 32 bits words or 64 bit words ? immediate characters (seen on Clément Bera's blog about Spur and new object format) ? Alain
Re: [Pharo-users] Ridiculous we are
On Tue, 23 Sep 2014 08:51:54 -0400, Hilaire hila...@drgeo.eu wrote: Le 23/09/2014 14:09, Damien Cassou a écrit : I recently read documents about utf-8 encoding. In all of them, the author says that pathnames should be kept as is because you never know which encoding the filesystem uses. So, a filename should probably be a bytearray. yes, but a #é should be encoded in two bytes. As noted in my previous message, é could be represented as either one or two Unicode code points, and these in turn could validly be either two or three bytes in UTF-8. My gut says that $é should be U+00E9, because otherwise you should have to use two Characters ($e and $´), but you could legitimately argue otherwise as well, and at any rate, #é could definitely be either. This is likely the core of the issue you're hitting.
Re: [Pharo-users] Ridiculous we are
On Mon, 22 Sep 2014 17:58:41 -0400, Sven Van Caekenberghe s...@stfx.eu wrote: I also find the way some problems are reported quite disturbing. How much testing did you do ? On which platforms ? I can do this (in Pharo 3) without any problems (we're talking about arbitrary Unicode characters in path names): ('/tmp' asFileReference / 'été') ensureCreateDirectory. '/tmp/été' asFileReference exists. ('/tmp/été' asFileReference / 'Ελλάδα.txt') writeStreamDo: [ :out | out 'What about Greece ?' ]. ('/tmp/été' asFileReference / 'Ελλάδα.txt') exists. ('/tmp/été' asFileReference / 'Ελλάδα.txt') contents. And in a terminal, I get: $ ls /tmp/été/Ελλάδα.txt /tmp/été/Ελλάδα.txt $ cat !$ cat /tmp/été/Ελλάδα.txt What about Greece ? This is on Mac OS X. So this part fundamentally works in the image and on one VM. There might of course be problems in how paths are used in certain places or on certain VM/platforms. Focusing purely on Unicode itself (not the encoding systems), a letter like é can be represented as U+00E9 (LATIN SMALL LETTER E WITH ACUTE), or as U+0065 (LATIN SMALL LETTER E) followed by U+0301 (combining acute accent). These will appear identical to the user, but are emphatically *not* identical for most software. The way you're testing here, you will not hit any error relating to this concept, ever, because you're using Pharo for both generating and consuming the strings. At the very least, we'd need to generate a file named été with both forms explicitly and see what happens. Things get even more exciting, though, because Unix says that file names are simply arbitrary byte patterns that do not contain the null byte.* Thus, you can trivially create a file named été using Latin-1 encoding, and again using UTF-8 encoding, and again using UTF-7 encoding, and these might all be shown to the user as identically named, but I guarantee you that Pharo will not act sanely with all four of these. Even on Windows, where things are a bit saner (NTFS mandates UTF-16), and where an explicit normalization form is preferred (NFC), I just explicitly verified that I can trivially inject other normalization forms into the file system. Thus, you can still have two files named été that nevertheless have different names as far as the OS is concerned. In this case, as far as I can tell, Pharo assumes that all path names are Unicode, and does not do any work to convert strings to or from the various normalization schemes (looking in Path classcanonicalizeElements:, Path classfrom:delimiter, and FileSystemStorepathFromString: here). There's therefore a pretty straightforward fix that Pharo could do: 1. Path would use ByteArrays as the actual canonical store, and provide convenience methods to see what the array decodes to in various encodings. The developer and application can make decisions about what encoding system they want to use. 2. The VM likely needs to be modified to handle this (didn't check) As much as I wish Hilaire provided more details in his bug report, it's worth keeping in mind that not all users, or even all programmers, understand the full implications of things like how various Unicode normalization and encoding schemes interact in practice with Unix's very vague concept of what a file name actually is, so I usually try to approach these bug reports carefully and with an open mind. --Benjamin * On OS X, HFS+ uses UTF-16 with an Apple-specific variant of NFD, whereas I do not believe this holds for e.g. UFS or FUSE-backed file systems, so things are a bit subtler there, but the general rule holds.
Re: [Pharo-users] Ridiculous we are
On 24 Sep 2014, at 18:48, Benjamin Pollack benja...@bitquabit.com wrote: On Tue, 23 Sep 2014 08:51:54 -0400, Hilaire hila...@drgeo.eu wrote: Le 23/09/2014 14:09, Damien Cassou a écrit : I recently read documents about utf-8 encoding. In all of them, the author says that pathnames should be kept as is because you never know which encoding the filesystem uses. So, a filename should probably be a bytearray. yes, but a #é should be encoded in two bytes. As noted in my previous message, é could be represented as either one or two Unicode code points, and these in turn could validly be either two or three bytes in UTF-8. My gut says that $é should be U+00E9, because otherwise you should have to use two Characters ($e and $´), but you could legitimately argue otherwise as well, and at any rate, #é could definitely be either. This is likely the core of the issue you're hitting. Did you read the actual conversation in the issue ? https://pharo.fogbugz.com/f/cases/14054/Issue-with-path-with-accented-characters It has been renamed and there is a fix (as a change set, not as a slice, yet). Basically, there was a primitive call into a plugin that failed to do encoding. Now regarding the issues you raised. Pharo does not do Unicode canonicalisation or any of that other fancy stuff (like categorisation, proper ordering and so on). This is another orthogonal and way more general issue. Regarding the pathnames encoding: if the OS itself does not know it, how can we ? I think that the current approach (assuming UTF-8) makes (the most) sense for a system that runs on multiple platforms. Sven
Re: [Pharo-users] Ridiculous we are
On Wed, 24 Sep 2014 13:03:57 -0400, Sven Van Caekenberghe s...@stfx.eu wrote: Did you read the actual conversation in the issue ? https://pharo.fogbugz.com/f/cases/14054/Issue-with-path-with-accented-characters It has been renamed and there is a fix (as a change set, not as a slice, yet). Basically, there was a primitive call into a plugin that failed to do encoding. No, I apologize; I missed the bug link. Thanks for reposting it. Now regarding the issues you raised. Pharo does not do Unicode canonicalisation or any of that other fancy stuff (like categorisation, proper ordering and so on). This is another orthogonal and way more general issue. Regarding the pathnames encoding: if the OS itself does not know it, how can we ? That's actually the argument *against* using UTF-8 as the standard Pharo way to represent filenames--at least on Unix systems. If Pharo used ByteArrays to represent paths, with convenience methods for working with UTF-8 (since I do agree that's the most likely thing for a user/dev to want), then you'd be able to work with all files no matter what, *and* have a convenient way of doing so for the common case. This is an old discussion, and I do see both sides of it. In terms of SCMs, Mercurial and Git both just say it's a collection of bytes, whereas Subversion says it's Unicode code points. This has some uncomfortable implications for both systems when working on multiple platforms. --Benjamin
Re: [Pharo-users] Ridiculous we are
On 24 Sep 2014, at 19:09, Benjamin Pollack benja...@bitquabit.com wrote: On Wed, 24 Sep 2014 13:03:57 -0400, Sven Van Caekenberghe s...@stfx.eu wrote: Did you read the actual conversation in the issue ? https://pharo.fogbugz.com/f/cases/14054/Issue-with-path-with-accented-characters It has been renamed and there is a fix (as a change set, not as a slice, yet). Basically, there was a primitive call into a plugin that failed to do encoding. No, I apologize; I missed the bug link. Thanks for reposting it. Now regarding the issues you raised. Pharo does not do Unicode canonicalisation or any of that other fancy stuff (like categorisation, proper ordering and so on). This is another orthogonal and way more general issue. Regarding the pathnames encoding: if the OS itself does not know it, how can we ? That's actually the argument *against* using UTF-8 as the standard Pharo way to represent filenames--at least on Unix systems. If Pharo used ByteArrays to represent paths, with convenience methods for working with UTF-8 (since I do agree that's the most likely thing for a user/dev to want), then you'd be able to work with all files no matter what, *and* have a convenient way of doing so for the common case. This is an old discussion, and I do see both sides of it. In terms of SCMs, Mercurial and Git both just say it's a collection of bytes, whereas Subversion says it's Unicode code points. This has some uncomfortable implications for both systems when working on multiple platforms. Benjamin, I think I understand the concern / situation that you describe. But I fail to see how not-interpreting it and interpreting it in different encodings can work in practice, especially since your point seems to be that there is no meta information that gives a definitive answer. I would guess that other languages, say Java or Python, have some approach to handle this problem ? Also, since we are living with the current approach without much problems, I think the issue is not terribly pressing. Sven
Re: [Pharo-users] Ridiculous we are
Le 24/09/2014 19:09, Benjamin Pollack a écrit : If Pharo used ByteArrays to represent paths, with convenience methods for working with UTF-8 (since I do agree that's the most likely thing for a user/dev to want), then you'd be able to work with all files no matter what, *and* have a convenient way of doing so for the common case. Hi Ben, I strongly disagree with you on this point: using byte arrays (or byte strings) is a pain in an international context. The OS knows about its encoding: locale for unix, code page for windows. Windows code pages depends on country, for english windows 1252 (similar to iso-8859-1), for other european countries, other variations of 8859-xx... (welcome to ISO soup), same for unix. Java uses UTF8 strings and dotNet uses UTF16 strings (don't know for Python) where chars are not bytes and they are not used as byte arrays but as Character arrays. Both do conversions from OS character set encoding to internal encoding for strings (paths and whatever). There is already an UTF8 and UTF16 encoding support in Pharo, but the standard String class uses bytes, and lot of files, directories and system methods use ByteString class and that is the problem here. UTF8 encoding in Pharo encodes to a variable lenght ByteString, which is not the same as an (hypothetical) Utf8String where all (variable length) chars would be utf8 encoded. Using a new UTF8 or UTF16 string class could be a major rework, but taking a decision about about internal string encoding is needed. As Sven says, there is no emergency and you have a workaround, but perhaps using the existing WideString encoded as UTF16 (or UTF32?) in some well defined classes/methods could be a good start for this rework? IMHO the workaround of using utf8 encoded byte strings is not a good way to deal with this problem and should not be granted as the solution.
Re: [Pharo-users] Ridiculous we are
Alain, On 24 Sep 2014, at 23:00, Alain Rastoul alf.mmm@gmail.com wrote: Le 24/09/2014 19:09, Benjamin Pollack a écrit : If Pharo used ByteArrays to represent paths, with convenience methods for working with UTF-8 (since I do agree that's the most likely thing for a user/dev to want), then you'd be able to work with all files no matter what, *and* have a convenient way of doing so for the common case. Hi Ben, I strongly disagree with you on this point: using byte arrays (or byte strings) is a pain in an international context. The OS knows about its encoding: locale for unix, code page for windows. Windows code pages depends on country, for english windows 1252 (similar to iso-8859-1), for other european countries, other variations of 8859-xx... (welcome to ISO soup), same for unix. Java uses UTF8 strings and dotNet uses UTF16 strings (don't know for Python) where chars are not bytes and they are not used as byte arrays but as Character arrays. Both do conversions from OS character set encoding to internal encoding for strings (paths and whatever). There is already an UTF8 and UTF16 encoding support in Pharo, but the standard String class uses bytes, and lot of files, directories and system methods use ByteString class and that is the problem here. UTF8 encoding in Pharo encodes to a variable lenght ByteString, which is not the same as an (hypothetical) Utf8String where all (variable length) chars would be utf8 encoded. Using a new UTF8 or UTF16 string class could be a major rework, but taking a decision about about internal string encoding is needed. As Sven says, there is no emergency and you have a workaround, but perhaps using the existing WideString encoded as UTF16 (or UTF32?) in some well defined classes/methods could be a good start for this rework? IMHO the workaround of using utf8 encoded byte strings is not a good way to deal with this problem and should not be granted as the solution. The character encoding situation in Pharo is pretty good actually. The only problem is that there is some old school code left that encodes strings into strings, but today you can easily write much better and conceptually correct code. You could have a look at this draft chapter of the upcoming 'Enterprise Pharo' book that I am currently writing: http://stfx.eu/EnterprisePharo/Zinc-Encoding-Meta/ Concerning file system paths, FilePathEncoder and FilePluginPrimitives already do the right thing. Now, your idea about using UTF-8 to represent internal Strings is something that has been discussed before and in many other languages as well. The short answer is that due to it being variable length, the inefficiency is (probably) just too high. Simple indexed access becomes a problem, let alone more complex string manipulations. I am not saying that it cannot be done, I think it is just not worth the trouble. The current solution in Pharo with ByteString and WideString is quite nice (check the chapter I mentioned before). Sven
Re: [Pharo-users] Ridiculous we are
Le 25/09/2014 00:06, Sven Van Caekenberghe a écrit : Alain, The character encoding situation in Pharo is pretty good actually. The only problem is that there is some old school code left that encodes strings into strings, but today you can easily write much better and conceptually correct code. You could have a look at this draft chapter of the upcoming 'Enterprise Pharo' book that I am currently writing: http://stfx.eu/EnterprisePharo/Zinc-Encoding-Meta/ Concerning file system paths, FilePathEncoder and FilePluginPrimitives already do the right thing. Now, your idea about using UTF-8 to represent internal Strings is something that has been discussed before and in many other languages as well. The short answer is that due to it being variable length, the inefficiency is (probably) just too high. Simple indexed access becomes a problem, let alone more complex string manipulations. I am not saying that it cannot be done, I think it is just not worth the trouble. The current solution in Pharo with ByteString and WideString is quite nice (check the chapter I mentioned before). Sven Very interesting ! It seems that most of what I was saying is already here :) I was not saying that Pharo should use utf8 (I mentionned utf8 because it is a standard, but I find the variable length encoding very weird), I was rather talking of using WideString in UTF 16 or 32 and that's done. I saw asWideString but didn't know about automatic convertion or codepoint selector and internal wide string support. Does it means that Pharo Greek users (for example) use WideString for Strings without having to specify it or make explicit convertions (except of course when dealing with bytes if they want to) ? If yes, very good, job is almost done :) (personnally I would also deprecate ByteString, and get rid of it, just my opinion). Thanks for the link, another good chapter . Regards, Alain
Re: [Pharo-users] Ridiculous we are
On 25 Sep 2014, at 01:04, Alain Rastoul alf.mmm@gmail.com wrote: Le 25/09/2014 00:06, Sven Van Caekenberghe a écrit : Alain, The character encoding situation in Pharo is pretty good actually. The only problem is that there is some old school code left that encodes strings into strings, but today you can easily write much better and conceptually correct code. You could have a look at this draft chapter of the upcoming 'Enterprise Pharo' book that I am currently writing: http://stfx.eu/EnterprisePharo/Zinc-Encoding-Meta/ Concerning file system paths, FilePathEncoder and FilePluginPrimitives already do the right thing. Now, your idea about using UTF-8 to represent internal Strings is something that has been discussed before and in many other languages as well. The short answer is that due to it being variable length, the inefficiency is (probably) just too high. Simple indexed access becomes a problem, let alone more complex string manipulations. I am not saying that it cannot be done, I think it is just not worth the trouble. The current solution in Pharo with ByteString and WideString is quite nice (check the chapter I mentioned before). Sven Very interesting ! It seems that most of what I was saying is already here :) I was not saying that Pharo should use utf8 (I mentionned utf8 because it is a standard, but I find the variable length encoding very weird), I was rather talking of using WideString in UTF 16 or 32 and that's done. I saw asWideString but didn't know about automatic convertion or codepoint selector and internal wide string support. Does it means that Pharo Greek users (for example) use WideString for Strings without having to specify it or make explicit convertions (except of course when dealing with bytes if they want to) ? If yes, very good, job is almost done :) (personnally I would also deprecate ByteString, and get rid of it, just my opinion). Thanks for the link, another good chapter . Regards, Alain Yes, the Greek users won't notice a difference, it is all transparent. ByteString is important because it is an optimalization of the most common case. As a normal user you should only think of abstract Strings and never use #asByteString (but use proper encoding). Feedback on the chapter is always welcome. Sven
Re: [Pharo-users] Ridiculous we are
On Mon, Sep 22, 2014 at 10:07 PM, Hilaire hila...@drgeo.eu wrote: However font path seems ok: File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources. Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it should be utf-8, right? I recently read documents about utf-8 encoding. In all of them, the author says that pathnames should be kept as is because you never know which encoding the filesystem uses. So, a filename should probably be a bytearray. -- Damien Cassou http://damiencassou.seasidehosting.st Success is the ability to go from one failure to another without losing enthusiasm. Winston Churchill
Re: [Pharo-users] Ridiculous we are
Le 23/09/2014 14:09, Damien Cassou a écrit : I recently read documents about utf-8 encoding. In all of them, the author says that pathnames should be kept as is because you never know which encoding the filesystem uses. So, a filename should probably be a bytearray. yes, but a #é should be encoded in two bytes. But although it looks strange, I am not sure it is the exact problem because I can use accented file name for sketch, but problem arise when loading a font. So may be the code loading a font. (cf my bug report) Hilaire -- Dr. Geo - http://drgeo.eu iStoa - http://istoa.drgeo.eu
[Pharo-users] Ridiculous we are
Hello, Tested on Linux, when I move DrGeo.app folder under hierarchy tree with accent characters (For example, /home/hilaire/Téléchargement/), loading font does not work However font path seems ok: File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources. Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it should be utf-8, right? I think there are issue on Windows, as some user reported to me. Holy shit. Hilaire -- Dr. Geo - http://drgeo.eu iStoa - http://istao.drgeo.eu
Re: [Pharo-users] Ridiculous we are
:-( I will soon face the same problem I fear, when I will start my lecture… Alexandre -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. On Sep 22, 2014, at 5:07 PM, Hilaire hila...@drgeo.eu wrote: Hello, Tested on Linux, when I move DrGeo.app folder under hierarchy tree with accent characters (For example, /home/hilaire/Téléchargement/), loading font does not work However font path seems ok: File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources. Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it should be utf-8, right? I think there are issue on Windows, as some user reported to me. Holy shit. Hilaire -- Dr. Geo - http://drgeo.eu iStoa - http://istao.drgeo.eu
Re: [Pharo-users] Ridiculous we are
Can you create an issue? I am cleaning the fonts and in some case I could consider this issue. If it is problem only on Windows, I will need someone’s assistance. Cheers, Juraj On Sep 22, 2014, at 5:07 PM, Hilaire hila...@drgeo.eu wrote: Hello, Tested on Linux, when I move DrGeo.app folder under hierarchy tree with accent characters (For example, /home/hilaire/Téléchargement/), loading font does not work However font path seems ok: File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources. Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it should be utf-8, right? I think there are issue on Windows, as some user reported to me. Holy shit. Hilaire -- Dr. Geo - http://drgeo.eu iStoa - http://istao.drgeo.eu
Re: [Pharo-users] Ridiculous we are
You can use screenshot. But back to the issue, in other part of DrGeo, when saving/loading sketch, path or filename with accent, space are ok. So not sure what's going on. Hilaire Le 22/09/2014 22:15, Alexandre Bergel a écrit : :-( I will soon face the same problem I fear, when I will start my lecture… Alexandre -- Dr. Geo - http://drgeo.eu iStoa - http://istao.drgeo.eu
Re: [Pharo-users] Ridiculous we are
Hilaire These are two days that after upgrading my iPhone, the recovery process crash. After two days trying I finally succeeded to upload my recovery to my iPhone and now my iPhone crashes continously at boot time. I get a nice sepia screenshot and it restarts. I will have to send my iPhone to Apple for real check. Just because I did an update! So I do not accept the title of your email. Simply I cannot. Do you imagine the billions injected into iPhone. So probably iPhone is one order of magnitude more complex than Pharo but the money injected into Pharo is our collective time and it is far from being an order of magnitude smaller than several billions. Stef On 22/9/14 22:07, Hilaire wrote: Hello, Tested on Linux, when I move DrGeo.app folder under hierarchy tree with accent characters (For example, /home/hilaire/Téléchargement/), loading font does not work However font path seems ok: File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources. Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it should be utf-8, right? I think there are issue on Windows, as some user reported to me. Holy shit. Hilaire
Re: [Pharo-users] Ridiculous we are
The issue is already there https://pharo.fogbugz.com/f/cases/14054/Issue-with-path-with-accented-characters I try to document it but it is odd, because for some other part in DrGeo I don't have issue with accented path. But should not the path be utf-8 encoded? Or is my fresh linuxmint box using non utf-8 filename, not it can't be. Hilaire Le 22/09/2014 22:20, Juraj Kubelka a écrit : Can you create an issue? I am cleaning the fonts and in some case I could consider this issue. If it is problem only on Windows, I will need someone’s assistance. -- Dr. Geo - http://drgeo.eu iStoa - http://istao.drgeo.eu
Re: [Pharo-users] Ridiculous we are
Le 22/09/2014 22:35, stepharo a écrit : So I do not accept the title of your email. Simply I cannot. Don't worry, it is a temporary cry/yield of frustration. -- Dr. Geo - http://drgeo.eu iStoa - http://istao.drgeo.eu
Re: [Pharo-users] Ridiculous we are
Also, sometimes things do look like Téléchargement but are still Downloads under the hood as the OS translates the UI. Phil On Mon, Sep 22, 2014 at 10:35 PM, stepharo steph...@free.fr wrote: Hilaire These are two days that after upgrading my iPhone, the recovery process crash. After two days trying I finally succeeded to upload my recovery to my iPhone and now my iPhone crashes continously at boot time. I get a nice sepia screenshot and it restarts. I will have to send my iPhone to Apple for real check. Just because I did an update! So I do not accept the title of your email. Simply I cannot. Do you imagine the billions injected into iPhone. So probably iPhone is one order of magnitude more complex than Pharo but the money injected into Pharo is our collective time and it is far from being an order of magnitude smaller than several billions. Stef On 22/9/14 22:07, Hilaire wrote: Hello, Tested on Linux, when I move DrGeo.app folder under hierarchy tree with accent characters (For example, /home/hilaire/Téléchargement/), loading font does not work However font path seems ok: File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources. Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it should be utf-8, right? I think there are issue on Windows, as some user reported to me. Holy shit. Hilaire
Re: [Pharo-users] Ridiculous we are
Le 22/09/2014 23:14, p...@highoctane.be a écrit : Also, sometimes things do look like Téléchargement but are still Downloads under the hood as the OS translates the UI. Yes, I check within another path of my own like 'été', still same issue. Strange is I have no issue to search for sketch file with accent. Only when loading the font. Hilaire -- Dr. Geo - http://drgeo.eu iStoa - http://istoa.drgeo.eu
Re: [Pharo-users] Ridiculous we are
There is a similar issue for windows 13127 https://pharo.fogbugz.com/default.asp?13127 can not (always) read permissions for directoryentries on a path with nonascii characters 2014-09-22 23:21 GMT+02:00 Hilaire hila...@drgeo.eu: Le 22/09/2014 23:14, p...@highoctane.be a écrit : Also, sometimes things do look like Téléchargement but are still Downloads under the hood as the OS translates the UI. Yes, I check within another path of my own like 'été', still same issue. Strange is I have no issue to search for sketch file with accent. Only when loading the font. Hilaire -- Dr. Geo - http://drgeo.eu iStoa - http://istoa.drgeo.eu
Re: [Pharo-users] Ridiculous we are
I also find the way some problems are reported quite disturbing. How much testing did you do ? On which platforms ? I can do this (in Pharo 3) without any problems (we're talking about arbitrary Unicode characters in path names): ('/tmp' asFileReference / 'été') ensureCreateDirectory. '/tmp/été' asFileReference exists. ('/tmp/été' asFileReference / 'Ελλάδα.txt') writeStreamDo: [ :out | out 'What about Greece ?' ]. ('/tmp/été' asFileReference / 'Ελλάδα.txt') exists. ('/tmp/été' asFileReference / 'Ελλάδα.txt') contents. And in a terminal, I get: $ ls /tmp/été/Ελλάδα.txt /tmp/été/Ελλάδα.txt $ cat !$ cat /tmp/été/Ελλάδα.txt What about Greece ? This is on Mac OS X. So this part fundamentally works in the image and on one VM. There might of course be problems in how paths are used in certain places or on certain VM/platforms. Sven On 22 Sep 2014, at 22:35, stepharo steph...@free.fr wrote: Hilaire These are two days that after upgrading my iPhone, the recovery process crash. After two days trying I finally succeeded to upload my recovery to my iPhone and now my iPhone crashes continously at boot time. I get a nice sepia screenshot and it restarts. I will have to send my iPhone to Apple for real check. Just because I did an update! So I do not accept the title of your email. Simply I cannot. Do you imagine the billions injected into iPhone. So probably iPhone is one order of magnitude more complex than Pharo but the money injected into Pharo is our collective time and it is far from being an order of magnitude smaller than several billions. Stef On 22/9/14 22:07, Hilaire wrote: Hello, Tested on Linux, when I move DrGeo.app folder under hierarchy tree with accent characters (For example, /home/hilaire/Téléchargement/), loading font does not work However font path seems ok: File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources. Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it should be utf-8, right? I think there are issue on Windows, as some user reported to me. Holy shit. Hilaire
Re: [Pharo-users] Ridiculous we are
so I stay with my 8Gb iTouch iOS 3 ; with no prospect of an upgrade, I am sorta worry-free. If only it were also a phone ... Don't dial ... DO ! ;-) [ this msg was last seen in my default font ] On 22 September 2014 17:35, stepharo steph...@free.fr wrote: Hilaire These are two days that after upgrading my iPhone, the recovery process crash. After two days trying I finally succeeded to upload my recovery to my iPhone and now my iPhone crashes continously at boot time. I get a nice sepia screenshot and it restarts. I will have to send my iPhone to Apple for real check. Just because I did an update! So I do not accept the title of your email. Simply I cannot. Do you imagine the billions injected into iPhone. So probably iPhone is one order of magnitude more complex than Pharo but the money injected into Pharo is our collective time and it is far from being an order of magnitude smaller than several billions. Stef On 22/9/14 22:07, Hilaire wrote: Hello, Tested on Linux, when I move DrGeo.app folder under hierarchy tree with accent characters (For example, /home/hilaire/Téléchargement/), loading font does not work However font path seems ok: File @ /home/hilaire/Téléchargements/DrGeo.app/Contents/Resources. Inspecting this path, it looks like 'Téléchargements' is 8 bits, but it should be utf-8, right? I think there are issue on Windows, as some user reported to me. Holy shit. Hilaire