Re: encode $encoding list ???

ToddAndMargo via perl6-users Mon, 20 Jun 2022 01:14:27 -0700

On 6/19/22 23:11, Kevin Pye wrote:



On Mon, 20 Jun 2022, at 15:59, ToddAndMargo via perl6-users wrote:

I had to use "utf8-c8" to keep the line
from crashing.


Well, any of "iso-8859-1" (or "latin-1"), "windows-1251", "windows-1252" or 
"windows-932" would also have failed to crash. In fact any encoding which exactly encodes each 8-bit byte as a single 
character would work. This is because the byte sequence 0x84, 0x73 is not a valid utf8 sequence.

utf8-c8 is a synthetic encoding unique to raku which guarantees that a buf can 
be decoded into a string, modified as desired, and the decoded back into a buf 
without modifying any characters which haven't been explicitly changed. This is 
not guaranteed by other encode/decode pairs because of unicode normalisation. 
It is required for things like file names which generally don't follow unicode 
rules on many filesystems. You can't take a generic Linux (for example) 
filename and convert that to a string using a utf8 encoding because it might 
not be a valid utf8 string, despite being a valid filename.


I just wanted an output that did not crash.

I have been working with strings deliberately
full of unprintable characters, so I wanted to
show an example.

say Buf.new(97,98,99).decode

Is cute, but does not help when dealing with
unprintable characters.


This is my updates section on Buffer to String:

Buffer to String:
   > say Buf.new(97,98,99).decode
   abc

   >print Buf.new(0x84, 0x73, 0x77, 0x84, 0x79).decode("utf8-c8") ~ "\n"
   􏿽x84sw􏿽x84y
   (It is suppose to look like nonsense)

   > dd Buf.new(0x84, 0x73, 0x77, 0x84, 0x79)
   Buf.new(132,115,119,132,121)

   > dd Buf.new(0x84, 0x73, 0x77, 0x84, 0x79).decode("utf8-c8")
   "􏿽x84sw􏿽x84y"


   for a pretty output:
      > print Buf.new(ord("a"), ord("b"), ord("c")).decode("ascii") ~ "\n";
      abc
      > dd $x
      Buf[uint8 $x = Buf[uint8].new(97,98,99)

Decoding values, see:https://docs.raku.org/routine/encoding#class_IO::Handle

utf8
utf16
utf16le
utf16be
utf8-c8
iso-8859-1
windows-1251
windows-1252
windows-932
ascii



-T
Now I am updating my section on bitwise operations and buffers.

Re: encode $encoding list ???

Reply via email to