Some basic helper functions to deal with encodings of text files (like 
HTML, XHTML, XML) via HTTP. Developed for cssutils but looked worth an 
independent release.

Download from http://cthedot.de/encutils/
Included are some unittests.


License
        Creative Commons License
        http://creativecommons.org/licenses/by/2.0/


Functions:
Note: All encodings returned are uppercase.


encodingByMediaType(media_type, log=None)

     Returns a default encoding for the given Media-Type, e.g. 'UTF-8' 
   for media-type='application/xml'. If no default encoding is available 
returns None.


getHTTPInfo(HTTPResponse, log=None)

     Returns (media_type, encoding) information from the response' 
Content-Type HTTP header (case of headers is ignored.) May be (None, 
None) e.g. if no Content-Type header is available.

getMetaInfo(text, log=None)

     Returns (media_type, encoding) information from (first) X/HTML 
Content-Type <meta> element if available.


getXMLEncoding(text, log=None)

     Parses XML declaration of a document (if present) (simplified). 
Returns (encoding, explicit).
     No autodetection of BOM is done yet. If no explicit encoding is 
found returns ('UTF-8', False).


guessEncoding(HTTPResponse, text, log=None)

     Tries to find the encoding of given text. Uses information in 
headers of supplied HTTPResponse, possible XML declaration and X/HTML 
<meta> elements.
     Returns (encoding, mismatch). Encoding is the explicit or implicit 
encoding or None and returned always uppercase. Mismatch is True if any 
mismatches between media_type, XML declaration or textcontent are found. 
More detailed mismatch reports are written to the optional log.
     Mismatches are not nessecarily errors! For details see the 
specifications..


Plan is to integrate XML autodetection (of BOM) in the next release.


I would very much welcome any feedback about spec compliance, errors or 
other problems with the functions (or the tests!).
Please use http://cthedot.de/blog/?cat=14 or http://cthedot.de/contact/.

Thanks a lot!
chris


<P><A HREF="http://cthedot.de/encutils/";>encutils 0.4</A> - basic helper 
functions to deal with encodings of text files (17-Aug-05)
-- 
http://mail.python.org/mailman/listinfo/python-announce-list

        Support the Python Software Foundation:
        http://www.python.org/psf/donations.html

Reply via email to