Content-Type: text/html; charset=utf-8lias
For Python to parse this, I had to use Python's list of known encodings
in order to determine whether I could even parse the site (for passing
it to a string's .encode() method).
You haven't said why you think you need a list of known encodings!
I would have thought that just trying it on some dummy data will let you
determine very quickly whether the alleged encoding is supported by the
Python version etc that you are using.
E.g.
| >>> alleged_encoding = "utf-8lias"
| >>> "any old ascii".decode(alleged_encoding)
| Traceback (most recent call last):
| File "<stdin>", line 1, in <module>
| LookupError: unknown encoding: utf-8lias
I then try to remap the bogus encoding to one it seems most like
(in this case, utf-8) and retry. Having a list of encodings
allows me to either eyeball or define a heuristic to say "this is
the closest match...try this one instead". That mapping can then
be used to update a mapping file so I don't have to think about
it the next time I encounter the same bogus encoding.
-tkc
--
http://mail.python.org/mailman/listinfo/python-list