I volunteered to look into this for scripts, and found that UTF8 encoding is a safe way to go. There are many string encoding/decoding standards/codecs, UTF, UCS, etc and variants within families. see http://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings. The same situation for video codecs has happened for strings. geez. Anyhoo...
Requirements: In general, I think we want encoding and decoding arbitrary binary strings into text strings that can be entered by any keyboard, saved and decoded losslessly in the blend file, displayed on the user's computer, safely sent by email, used as parts of URLs, or included as part of an HTTP POST request, be a valid filename, etc. I think that UTF8 would suit our purposes now and for the next decade or two. UTF-8 can encode any Unicode character. The downside is that because encoded strings may contain many zero bytes, the strings cannot be manipulated by normal C string handling for even simple operations such as copy. This means that a pass through the ENTIRE code base is needed to seek out all str functions and replace them with a call to encode/decode. in 2007, Python adopted UTF8 and recoded their base to use it. http://www.python.org/dev/peps/pep-3120/ For a Py3 discussion, see http://www.python.org/dev/peps/pep-0383/. For displaying encoded strings, Py3k uses an pretty involved process: http://www.python.org/dev/peps/pep-3138/ For the python code base itself, as of 2007, they also have issues and more questions than answers see http://www.python.org/dev/peps/pep-3131/ and the bottom line is: english normal characters. UTF16 is the ultimate alternative for international error messages, etc and is what is used in the Mac OSX and Windows. It can encode any glyph. There are some space saving advantages for UTF16, but only if the text is mostly glyphs. Characters U+0800 through U+FFFF use three bytes in UTF-8, but only two in UTF-16. As a result, text in (for example) Chinese, Japanese or Hindi could take more space in UTF-8 if there are more of these characters than there are ASCII characters. This rarely happens in real documents, for example both the Japanese and the Korean UTF-8 article on Wikipedia take more space if saved as UTF-16 than the original UTF-8 version --Roger _______________________________________________ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers