On 2 May 2007 09:19:25 -0700, [EMAIL PROTECTED] wrote: >The code: > >import codecs > >udlASCII = file("c:\\temp\\CSVDB.udl",'r') >udlUNI = codecs.open("c:\\temp\\CSVDB2.udl",'w',"utf_16") > >udlUNI.write(udlASCII.read()) > >udlUNI.close() >udlASCII.close() > >This doesn't seem to generate the correct line endings. Instead of >converting 0x0D/0x0A to 0x0D/0x00/0x0A/0x00, it leaves it as 0x0D/ >0x0A > >I have tried various 2 byte unicode encoding but it doesn't seem to >make a difference. I have also tried modifying the code to read and >convert a line at a time, but that didn't make any difference either. > >I have tried to understand the unicode docs but nothing seems to >indicate why an seemingly incorrect conversion is being done. >Obviously I am missing something blindingly obvious here, any help >much appreciated.
Consider this simple example: >>> import codecs >>> f = codecs.open('test-newlines-file', 'w', 'utf16') >>> f.write('\r\n') >>> f.close() >>> f = file('test-newlines-file') >>> f.read() '\xff\xfe\r\x00\n\x00' >>> And how it differs from your example. Are you sure you're examining the resulting output properly? By the way, "\r\0\n\0" isn't a "unicode line ending", it's just the UTF-16 encoding of "\r\n". Jean-Paul -- http://mail.python.org/mailman/listinfo/python-list