Diez B. Roggisch <[EMAIL PROTECTED]> wrote: >> print try_encodings(text, ['ascii', 'utf-8', 'iso8859_1', 'cp1252', >> 'macroman'] > > I've fallen into that trap before - it won't work after the iso8859_1. > The reason is that an eight-bit encoding have all 256 code-points > assigned (usually, there are exceptions but you have to be lucky to have > a string that contains a value not assigned in one of them - which is > highly unlikely) > > AFAIK iso-8859-1 has all codepoints taken - so you won't go beyond that > in your example.
I pasted from a wrong file :-) See my previous posting (a few days ago) - what I did was to implement iso8859_1_ncc encoding (iso8859_1 without control codes) and the line should have been try_encodings(text, ['ascii', 'utf-8', 'iso8859_1_ncc', 'cp1252', 'macroman'] where iso8859_1_ncc.py is the same as iso8859_1.py from python distribution, with this line different: decoding_map = codecs.make_identity_dict(range(32, 128)+range(128+32,256)) -- ----------------------------------------------------------- | Radovan GarabĂk http://kassiopeia.juls.savba.sk/~garabik/ | | __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk | ----------------------------------------------------------- Antivirus alert: file .signature infected by signature virus. Hi! I'm a signature virus! Copy me into your signature file to help me spread! -- http://mail.python.org/mailman/listinfo/python-list