Re: [bitc-dev] BitC Strings and Unicode

William ML Leslie Mon, 20 Oct 2014 05:18:08 -0700

On 16 October 2014 04:21, Jonathan S. Shapiro <[email protected]> wrote:
> 1. Do we all buy the story that Strings are conceptually used for text?


The primary string type, with the most straightforward literal, which
is given to you by a wide range of library functions, should probably
be a text type.

I think there is still a need for bytes literals, bytes formatting,
and bytes String.join (.partition, .split ...).  Python 3 originally
tried to remove some of these things, but ended up back-pedalling.

I've used languages that didn't have a first class bytes type, and
I've seen programmers jump from byte[] to String and back just to use
string methods on network packets.  I recommend having a
fully-featured bytestring type.

> 2. Does following set of rules for strings make sense? If no, why not?
>
> Strings are normalized via NFC
> String operations preserve NFC encoding
> Strings are encoded in UTF-8
> Strings are indexed by the byte

You could probably convince me of this.  In my head I want them to be
opaque so that you can't obtain part of a character and there is no
need for runtime index checking once the index has been obtained.
OTOH, sometimes the size of a section of the string really matters.

-- 
William Leslie

Notice:
Likely much of this email is, by the nature of copyright, covered
under copyright law.  You absolutely MAY reproduce any part of it in
accordance with the copyright law of the nation you are reading this
in.  Any attempt to DENY YOU THOSE RIGHTS would be illegal without
prior contractual agreement.
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] BitC Strings and Unicode

Reply via email to