George Sakkis wrote: > I'm trying to use codecs.open() and I see two issues when I pass > encoding='utf8': > > 1) Newlines are hardcoded to LINEFEED (ascii 10) instead of the > platform-specific byte(s). > > import codecs > f = codecs.open('tmp.txt', 'w', encoding='utf8') > s = u'\u0391\u03b8\u03ae\u03bd\u03b1' > print >> f, s > print >> f, s > f.close() > > This doesn't happen for the default encoding (=None). > > 2) csv.writer doesn't seem to work as expected when being passed a > codecs object; it treats it as if encoding is ascii: > > import codecs, csv > f = codecs.open('tmp.txt', 'w', encoding='utf8') > s = u'\u0391\u03b8\u03ae\u03bd\u03b1' > # this works fine > print >> f, s > # this doesn't > csv.writer(f).writerow([s]) > f.close() > > Traceback (most recent call last): > ... > csv.writer(f).writerow([s]) > UnicodeEncodeError: 'ascii' codec can't encode character u'\u0391' in > position 0: ordinal not in range(128) > > Is this the expected behavior or are these bugs ?
Looking into the documentation """ Note: This version of the csv module doesn't support Unicode input. Also, there are currently some issues regarding ASCII NUL characters. Accordingly, all input should be UTF-8 or printable ASCII to be safe; see the examples in section 9.1.5. These restrictions will be removed in the future. """ and into the source code if encoding is not None and \ 'b' not in mode: # Force opening of the file in binary mode mode = mode + 'b' I'd be willing to say that both are implementation limitations. Peter -- http://mail.python.org/mailman/listinfo/python-list