On Thu, 15 Jan 2009, Bryan Jurish wrote:
Would anyone object if the [any2string] semantics were changed so that
only "unsigned char" values in the range (0..255) get output, rather
than (as is currently the case) "signed char" values in the range
(-128..127)?
I would object, as I expect to be able to put values in the range 0 to
1114111, or at the very least the range of Unicode that people would
use... 65535 is probably not enough. I'd recommend storing strings as
either UCS-4 or UTF-8, but in the latter case you have variable number of
bytes to take care of. Internally, I believe that UCS-4 (32-bit encoding)
is full good, as Pd's lists of floats are gonna be encoded over 64-bits or
128-bits anyway (wasting nearly half or 3/4 of the bits depending on
whether you have a 32-bit or 64-bit OS/mode).
What's important to me is that the Pd user does not struggle with making
pd interpret UTF-8 variable-length encoding, and instead struggles with
making pd work with lists of characters, which is already enough work
anyway. I like that [list length] gives me the number of characters and
not the number of bytes, because the latter is rarely significant.
_ _ __ ___ _____ ________ _____________ _____________________ ...
| Mathieu Bouchard - tél:+1.514.383.3801, Montréal, Québec
_______________________________________________
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management ->
http://lists.puredata.info/listinfo/pd-list