On Wed, 11 May 2005 20:02:25 +0200, "Fredrik Lundh" <[EMAIL PROTECTED]> wrote:
>Skip Montanaro wrote: > >> Fredrik> does the CSV format even support Unicode-encoded data streams? >> >> Based on the requests I've seen here and on the [EMAIL PROTECTED] mailing >> list, >> it appears people are certainly generating CSV files which contain Unicode- >> encoded data. > >in what encodings? > >is the encoding specified inside the file? if so, how? > >(it should be noted that the phrase "Unicode-encoded data" that I >used doesn't make much sense, even in the original context. what >I meant to say was that CSV, as far as I know, isn't defined as a >stream of Unicode character, but rather as a stream of bytes in an >ASCII-compatible encoding. this means that you can use e.g. ISO- >8859-1 or UTF-8 for string values, but not that you can encode the >whole thing as, say UTF-16 or UCS-4). The CSV format is not defined at all, AFAIK. Empirically, writing CSV works more-or-less like this, for each row: # pseudocode, untested control_chars = '\r\n' # or maybe more or maybe just '\n' out_list = [] for each field: if field contains quote_char: out_field = quote_char + \ field.replace(quote_char, quote_char + quote_char) + \ quote_char elif field contains any one of delimiter or control_chars: out_field = quote_char + field + quote_char else: out_field = field out_list.append(out_field) then you write delimiter.join(out_list) followed by "\r\n" So there is no reason at all why a writer and a reader couldn't use the above quoting mechanism to transfer columnar data containing Unicode -- they just have to agree on the encoding, control characters, quote_char, delimiter, and line terminator. Excel (see my other post in this thread) provides a writing ("save as Unicode text") and reading mechanism which uses u'\t' as the delimiter, u'\r\n' as the line terminator, u'\"' as the quote_char, and utf-16 as the encoding. I haven't done an exhaustive check to see what its definition of control_chars would be. -- http://mail.python.org/mailman/listinfo/python-list