Some basic helper functions to deal with encodings of files retrieved 
via HTTP.

     Download from http://cthedot.de/encutils/


Changes in 0.2:
Mainly some documentation and internal name changes, some parameter 
names have changed as well.


Currently contained functions:

encodingByMediaType(media_type, log=None)
     Returns a default encoding for the given Media-Type, e.g. 'utf-8' 
for 'application/xml'.

getHTTPInfo(httpheaders, log=None)
     Returns (media_type, encoding) information from the Content-Type 
HTTP header from a HTTP header dictionary. May be (None, None) e.g. if 
no Content-Type header is available.
     XML documents have (RFC3023) a default encoding for various 
media-types if no explicit charset information is given, which may be 
"ascii" or "utf-8", see "encodingByMediaType".
     HTML documents have no default encoding.

getMetaInfo(text, log=None)
     Returns (media_type, encoding) information from (last) X/HTML 
Content-Type meta element.

guessEncoding(httpheaders, text, log=None)
     Tries to find encoding of given text and uses information in 
httpheaders and textcontent like HTML meta elements or the XML 
declaration (this is not implemented yet). Returns the explicit or 
implicit encoding or None. Mismatch reports are written to the log.


If there is a similar thing out please let me know (I know the Cookbook 
XML autodetection script which I'd like to integrate in a future version).

And I would very much appreciate any feedback about spec compliance, 
errors or other problems with the functions too. (Please use 
http://cthedot.de/blog/?p=11 or 
http://cthedot.de/contact/?subject1=encutils).

Thanks a lot!
chris
-- 
http://mail.python.org/mailman/listinfo/python-announce-list

        Support the Python Software Foundation:
        http://www.python.org/psf/donations.html

Reply via email to