moin Mathieu, moin all, On 2009-01-15 16:33:03, Mathieu Bouchard <ma...@artengine.ca> appears to have written: > On Thu, 15 Jan 2009, Bryan Jurish wrote: > >> Would anyone object if the [any2string] semantics were changed so that >> only "unsigned char" values in the range (0..255) get output, rather >> than (as is currently the case) "signed char" values in the range >> (-128..127)? > > What's important to me is that the Pd user does not struggle with making > pd interpret UTF-8 variable-length encoding, and instead struggles with > making pd work with lists of characters, which is already enough work > anyway.
Agreed (in principle at least)... At the risk of repeating myself, I wrote [any2string] and [string2any] as quick ugly hacks to get some sort of rudimentary string handling in pd. Roman mentioned a few other externals (e.g. [comport]) which expect unsigned raw byte values, which I think is sufficient reason to change the (byte-oriented) conventions of [any2string]. Unicode might be more immediately intuitive to most users, but when it comes down to it, byte-strings are IMHO the more basic representation (a char* is still a char*, even in this post-unicode world). Some of us even still use non-unicode encodings by default. A good string handling mechanism should have a good general default representation (e.g. as UTF-${MachineWordBits}), but should likewise allow access to "raw" byte strings, and be able to accommodate various encodings. Not that I'm really hankering to write any of that, mind you ;-) Perhaps a better name for the external as I think of it would be [any2bytes]. I'm perfectly willing to cede the "string" name to something better (Martin's string patch comes to mind), but that's just a labelling issue (and since variable names are arbitrary, and externals are in some sense variables, external names must therefore also be arbitrary ;-) > I like that [list length] gives me the number of characters and > not the number of bytes, because the latter is rarely significant. ... except if you're building rsp. reading a persistent index for a large file, in which case tell() & seek() are likely to be a wee bit faster than parsing and counting variable-length-encoded characters ... marmosets, Bryan -- Bryan Jurish "There is *always* one more bug." jur...@ling.uni-potsdam.de -Lubarsky's Law of Cybernetic Entomology _______________________________________________ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list