dirknbr <dirknbr <at> gmail.com> writes:

> I have kind of developped this but obviously it's not nice, any better
> ideas?
> 
>         try:
>             text=texts[i]
>             text=text.encode('latin-1')
>             text=text.encode('utf-8')
>         except:
>             text=' '

As Steven has pointed out, if the .encode('latin-1') works, the result is thrown
away. This would be very fortunate. 

It appears that your goal was to encode the text in latin1 if possible,
otherwise in UTF-8, with no indication of which encoding was used. Your second
posting confirmed that you were doing this in a loop, ending up with the
possibility that your output file would have records with mixed encodings.

Did you consider what a programmer writing code to READ your output file would
need to do, e.g. attempt to decode each record as UTF-8 with a fall-back to
latin1??? Did you consider what would be the result of sending a stream of
mixed-encoding text to a display device?

As already advised, the short answer to avoid all of that hassle; just encode in
UTF-8.



-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to