webcomm wrote: > I don't know what the character encoding of this data is and don't > know what the 'FFFD' represents.
The codepoint 0xFFFD is the so-called 'REPLACEMENT CHARACTER'. It is used replace an incoming character whose value is unknown or unrepresentable in Unicode. The browser might display these if for example a page is encoded in latin-1 but it claims to be utf-8, so the byte stream will contain byte sequences that can't be decoded into unicode code points. > I just > want to scrub it out. I tried this... > > clean = txt.encode('ascii','ignore') > > ...but the 'FFFD' still comes through. You must be doing something wrong, then: py> u'Hello,\ufffd World'.encode('ascii', 'ignore') 'Hello, World' HTH, -- Carsten Haese http://informixdb.sourceforge.net -- http://mail.python.org/mailman/listinfo/python-list