On 6/19/22 23:11, Kevin Pye wrote:
On Mon, 20 Jun 2022, at 15:59, ToddAndMargo via perl6-users wrote:
I had to use "utf8-c8" to keep the line
from crashing.
Well, any of "iso-8859-1" (or "latin-1"), "windows-1251", "windows-1252" or
"windows-932" would also have failed to crash. In fact any encoding which exactly encodes each 8-bit byte as a single
character would work. This is because the byte sequence 0x84, 0x73 is not a valid utf8 sequence.
utf8-c8 is a synthetic encoding unique to raku which guarantees that a buf can
be decoded into a string, modified as desired, and the decoded back into a buf
without modifying any characters which haven't been explicitly changed. This is
not guaranteed by other encode/decode pairs because of unicode normalisation.
It is required for things like file names which generally don't follow unicode
rules on many filesystems. You can't take a generic Linux (for example)
filename and convert that to a string using a utf8 encoding because it might
not be a valid utf8 string, despite being a valid filename.
I just wanted an output that did not crash.
I have been working with strings deliberately
full of unprintable characters, so I wanted to
show an example.
say Buf.new(97,98,99).decode
Is cute, but does not help when dealing with
unprintable characters.
This is my updates section on Buffer to String:
Buffer to String:
> say Buf.new(97,98,99).decode
abc
>print Buf.new(0x84, 0x73, 0x77, 0x84, 0x79).decode("utf8-c8") ~ "\n"
x84swx84y
(It is suppose to look like nonsense)
> dd Buf.new(0x84, 0x73, 0x77, 0x84, 0x79)
Buf.new(132,115,119,132,121)
> dd Buf.new(0x84, 0x73, 0x77, 0x84, 0x79).decode("utf8-c8")
"x84swx84y"
for a pretty output:
> print Buf.new(ord("a"), ord("b"), ord("c")).decode("ascii") ~ "\n";
abc
> dd $x
Buf[uint8 $x = Buf[uint8].new(97,98,99)
Decoding values, see:
https://docs.raku.org/routine/encoding#class_IO::Handle
utf8
utf16
utf16le
utf16be
utf8-c8
iso-8859-1
windows-1251
windows-1252
windows-932
ascii
-T
Now I am updating my section on bitwise operations and buffers.