On Sun, May 20, 2012 at 10:37 PM, Austin Ziegler <[email protected]> wrote: > On Wed, May 16, 2012 at 9:02 AM, Brian Candler <[email protected]> wrote: >> I will add that the OP is not entirely alone in his opinion. > > The OP may not be alone in his opinion, but that's because encodings > are broken in general. > > This is *not* a Ruby problem, this is a *data* problem.
I couldn't agree more. > C gets it wrong because it assumes that characters, code points, and > bytes are the same (but it gets a pass because it was created in a > time when this was true). And at least in C++ there are measures for multibyte characters. > Java gets it wrong because it uses a nominally-UTF-16 character width > (it's actually UCS-2) which doesn't allow for UTF-16 surrogates. Actually the situation with Java is even worse: at some point (i.e. with Java 5) they decided to add methods for dealing with all the Unicode code points which cannot be represented with 16 bit. See here for example: http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isDigit%28int%29 Unfortunately they had to maintain compatibility to the old 16 bit char type and hence used int for 32 bit code point representation. I am not sure how often these new methods are actually used but my guess would be: rarely. If you use them, code will become messy soon... > [...] > Ruby got it right, because it acknowledges that (a) this is hard and > (b) gives you the tools you need in order to make this less painful. > It also doesn't (c) incorrectly assume that everything is or can be > expressed safely in Unicode. (Shift-JIS will not roundtrip to Unicode > and back for some characters.) +1 to this and your other statements. Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/ -- You received this message because you are subscribed to the Google Groups ruby-talk-google group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at https://groups.google.com/d/forum/ruby-talk-google?hl=en
