Some basic helper functions to deal with encodings of files retrieved via HTTP.
Download from http://cthedot.de/encutils/ Changes in 0.2: Mainly some documentation and internal name changes, some parameter names have changed as well. Currently contained functions: encodingByMediaType(media_type, log=None) Returns a default encoding for the given Media-Type, e.g. 'utf-8' for 'application/xml'. getHTTPInfo(httpheaders, log=None) Returns (media_type, encoding) information from the Content-Type HTTP header from a HTTP header dictionary. May be (None, None) e.g. if no Content-Type header is available. XML documents have (RFC3023) a default encoding for various media-types if no explicit charset information is given, which may be "ascii" or "utf-8", see "encodingByMediaType". HTML documents have no default encoding. getMetaInfo(text, log=None) Returns (media_type, encoding) information from (last) X/HTML Content-Type meta element. guessEncoding(httpheaders, text, log=None) Tries to find encoding of given text and uses information in httpheaders and textcontent like HTML meta elements or the XML declaration (this is not implemented yet). Returns the explicit or implicit encoding or None. Mismatch reports are written to the log. If there is a similar thing out please let me know (I know the Cookbook XML autodetection script which I'd like to integrate in a future version). And I would very much appreciate any feedback about spec compliance, errors or other problems with the functions too. (Please use http://cthedot.de/blog/?p=11 or http://cthedot.de/contact/?subject1=encutils). Thanks a lot! chris -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations.html