On Fri, 13 Aug 2010 11:45:28 +0200, Jean-Michel Pichavant wrote: > I'm trying to update the content of a $Microsoft$ VC2005 project files > using a python application. > Since those files are XML data, I assumed I could easily do that. > > My problem is that VC somehow thinks that the file is corrupted and > update the file like the following: > > -<?xml version='1.0' encoding='UTF-8'?> > +?<feff><?xml version="1.0" encoding="UTF-8"?> > > > Actually, <feff> is displayed in a different color by vim, telling me > that this is some kind of special caracter code (I'm no familiar with > such thing).
U+FEFF is a "byte order mark" or BOM. Each Unicode-based encoding (UTF-8, UTF-16, UTF-16-LE, etc) will encode it differently, so it enables a program reading the file to determine the encoding before reading any actual data. > My problem is however simplier : how do I add such character at the > begining of the file ? > I tried Either: 1. Open the file as binary and write '\xef\xbb\xbf' to the file: f = open('foo.txt', 'wb') f.write('\xef\xbb\xbf') [You can also use the constant BOM_UTF8 from the codecs module.] 2. Open the file as utf-8 and write u'\ufeff' to the file: import codecs f = codecs.open('foo.txt', 'w', 'utf-8') f.write(u'\ufeff') 3. Open the file as utf-8-sig and don't write anything (or write an empty string): import codecs f = codecs.open('foo.txt', 'w', 'utf-8-sig') f.write('') The utf-8-sig codec automatically writes a BOM at the beginning of the file. It is present in Python 2.5 and later. -- http://mail.python.org/mailman/listinfo/python-list