Here's a J model for converting between utf8 and utf32 (two representations of unicode).
The model assumes characters are represented as numbers. We do not currently have 32 bit character literals. The conversion uses numeric properties of the characters. To convert 8 bit literals to numbers, you can use: charnum=: a.&i. Here's the model: CPLEN=:1 0 2 3 4 5 6 0 #~ 2>.2^i.-8 CPBASE=: 2^(#~ 2>.2&^)i.-8 CPOFF=: _128+ CPLEN i. ~.CPLEN utf8len=: {&CPLEN utf8dat=: {&CPBASE | ] utf8to32=: *@utf8len (64 #. utf8dat);.1 ] utf32to8=: [: ; <@((+ 128 + # {. CPOFF {~ #)@(#.inv~&64)^:(>&127))"0 Here's an illustration that these mechanisms are consistent with existing utf-8 support for a. (utf32to8 i.256)-: charnum 8 u: 2 u: a. 1 (i.256)-: utf8to32 charnum 8 u: 2 u: a. 1 Here's a test for valid utf-8: isutf8=: # -: CPLEN +/@:{~ ] This is not a complete test, because it only ensures that the right characters are present -- it does not ensure that they are ordered properly. Here's a version of the conversion from utf8 which fails if it's given invalid utf8: utf8valid=: [ assert@isutf8 utf8to32valid=: *@utf8len (64 #. utf8dat)@utf8valid;.1 utf8valid This provides a complete test since individual characters cannot be valid if the characters are not arranged properly. FYI, -- Raul ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm