On Sun, May 20, 2012 at 10:37 PM, Austin Ziegler <[email protected]> wrote:
> On Wed, May 16, 2012 at 9:02 AM, Brian Candler <[email protected]> wrote:
>> I will add that the OP is not entirely alone in his opinion.
>
> The OP may not be alone in his opinion, but that's because encodings
> are broken in general.
>
> This is *not* a Ruby problem, this is a *data* problem.

I couldn't agree more.

> C gets it wrong because it assumes that characters, code points, and
> bytes are the same (but it gets a pass because it was created in a
> time when this was true).

And at least in C++ there are measures for multibyte characters.

> Java gets it wrong because it uses a nominally-UTF-16 character width
> (it's actually UCS-2) which doesn't allow for UTF-16 surrogates.

Actually the situation with Java is even worse: at some point (i.e.
with Java 5) they decided to add methods for dealing with all the
Unicode code points which cannot be represented with 16 bit.  See here
for example:
http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isDigit%28int%29

Unfortunately they had to maintain compatibility to the old 16 bit
char type and hence used int for 32 bit code point representation.  I
am not sure how often these new methods are actually used but my guess
would be: rarely.  If you use them, code will become messy soon...

> [...]
> Ruby got it right, because it acknowledges that (a) this is hard and
> (b) gives you the tools you need in order to make this less painful.
> It also doesn't (c) incorrectly assume that everything is or can be
> expressed safely in Unicode. (Shift-JIS will not roundtrip to Unicode
> and back for some characters.)

+1 to this and your other statements.

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

-- You received this message because you are subscribed to the Google Groups 
ruby-talk-google group. To post to this group, send email to 
[email protected]. To unsubscribe from this group, send email 
to [email protected]. For more options, visit this 
group at https://groups.google.com/d/forum/ruby-talk-google?hl=en

Reply via email to