Andreas Jung <[EMAIL PROTECTED]> wrote:
> [-- text/plain, encoding quoted-printable, charset: us-ascii, 6 lines --]
> 
> Does anyone know of a Python module that is able to sniff the encoding of 
> text? Please: I know that there is no reliable way to do this but I need 
> something that works for most of the case...so please no discussion about 
> the sense of such a module and approach.
> 

depends on what exactly you need
one approach is pyenca

the other is:

def try_encoding(s, encodings):
    "try to guess the encoding of string s, testing encodings given in second 
parameter"

    for enc in encodings:
        try:
            test = unicode(s, enc)
            return enc
        except UnicodeDecodeError:
            pass

    return None

print try_encodings(text, ['ascii', 'utf-8', 'iso8859_1', 'cp1252', 'macroman']


depending on what language and encodings you expects the text to be in,
the first or second approach is better


-- 
 -----------------------------------------------------------
| Radovan GarabĂ­k http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__    garabik @ kassiopeia.juls.savba.sk     |
 -----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to