On Thu, 15 Jan 2009, Bryan Jurish wrote:

Would anyone object if the [any2string] semantics were changed so that only "unsigned char" values in the range (0..255) get output, rather than (as is currently the case) "signed char" values in the range (-128..127)?

I would object, as I expect to be able to put values in the range 0 to 1114111, or at the very least the range of Unicode that people would use... 65535 is probably not enough. I'd recommend storing strings as either UCS-4 or UTF-8, but in the latter case you have variable number of bytes to take care of. Internally, I believe that UCS-4 (32-bit encoding) is full good, as Pd's lists of floats are gonna be encoded over 64-bits or 128-bits anyway (wasting nearly half or 3/4 of the bits depending on whether you have a 32-bit or 64-bit OS/mode).

What's important to me is that the Pd user does not struggle with making pd interpret UTF-8 variable-length encoding, and instead struggles with making pd work with lists of characters, which is already enough work anyway. I like that [list length] gives me the number of characters and not the number of bytes, because the latter is rarely significant.

 _ _ __ ___ _____ ________ _____________ _____________________ ...
| Mathieu Bouchard - tél:+1.514.383.3801, Montréal, Québec
_______________________________________________
Pd-list@iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list

Reply via email to