I take it that this table describes the encoding of the byte stream:

http://en.wikipedia.org/wiki/UTF-8#Description

(I might actually attempt this in APL, just to see whether I can do it
while waiting for a built-in translation...)



On Sun, Apr 27, 2014 at 10:00 PM, Elias Mårtenson <loke...@gmail.com> wrote:

> To convert byte values to code points, you need to apply an encoding
> algorithm, and that's kind of messy.
>
> (I believe the rest of GNU APL kind of assumes that UTF-8 is the standard
> encoding used, which does make things simpler).
>
> I have a suggestion: Make ⎕UCS support a dyadic form where the left-hand
> side specifies the encoding to use. I.e:
>
> *'UTF-8' ⎕UCS 99 100 101 102*
>
>
> Handling multiple encodings is easily done through the *libiconv* library.
> I worked on it when I made some improvements to its Common Lisp
> integration. It's quite simple to use.
>
> Regards,
> Elias
>
>
> On 28 April 2014 12:49, David B. Lamkins <dlamk...@gmail.com> wrote:
>
>> That's close, but libfileio[8] returns a sequence of byte values; not
>> code points.
>>
>> On Mon, 2014-04-28 at 12:19 +0800, Elias Mårtenson wrote:
>> > Use the quad function ⎕UCS:
>> >
>> >
>> >       ⎕UCS 'foo⍉bar'
>> > 102 111 111 9033 98 97 114
>> >       ⎕UCS 102 111 111 9033 98 97 114
>> > foo⍉bar
>> >
>> >
>> > Regards,
>> > Elias
>> >
>> >
>> > On 28 April 2014 12:17, David B. Lamkins <dlamk...@gmail.com> wrote:
>> >         I can use lib_file_io to read a sequence of byte values from a
>> >         file
>> >         containing Unicode text.
>> >
>> >         How do I convert that sequence back to a Unicode string in GNU
>> >         APL?
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>


-- 
"The secret to creativity is knowing how to hide your sources."
   Albert Einstein


http://soundcloud.com/davidlamkins
http://reverbnation.com/lamkins
http://reverbnation.com/lcw
http://lamkins-guitar.com/
http://lamkins.net/
http://successful-lisp.com/

Reply via email to