Re: [python] Tisk unicode řetězce a tisk seznamu unicode řetězců

Petr Přikryl Wed, 08 Jun 2011 00:25:27 -0700

"David Rohleder" píše

[...]

Když už jsme u těch řetězců: jak se udělá raw unicode řetězec, tj.

> interpret neinterpretuje uvnitř? Generuju si latexový dokument a on se
> přirozeně vzteká na:

>> hlavicka = ur"""

> documentclass[a4,landscape]{article}
> usepackage{graphicx}
> usepackage[czech]{babel}
> usepackage[utf8]{inputenc}
> begin{document}
> thispagestyle{empty}
> """

>> print hlavicka>> SyntaxError: (unicode error) 'rawunicodeescape' codec can't decode bytes

> in position 39-40: truncated uXXXX


To je jedno z těch temných zákoutí. Ono se to vzteká už při

hlavicka = ur'usepackage'
print hlavicka

Problém je v tom, že raw-unicode-escape s předponou ur není tak úplně 'raw'.

Interpretují se posloupnosti uXXXX -- vizhttp://docs.python.org/tutorial/introduction.html#unicode-strings:

"For experts, there is also a raw mode just like the one for normal strings. You have to prefix the opening quote with ‘ur’ to have Python use the Raw-Unicode-Escape encoding. It will only apply the above uXXXX conversion if there is an uneven number of backslashes in front of the small ‘u’."

V tomto případě mě napadá jen ta možnost, že se napřed vyrobíraw string (starý, neunicode) a převede se druhým krokem

na unicode:

==============================
# -*- coding: utf-8 -*-

hlav = r"""
documentclass[a4,landscape]{article}
usepackage{graphicx}
usepackage[czech]{babel}
usepackage[utf8]{inputenc}
begin{document}
thispagestyle{empty}
"""
print type(hlav)
hlavicka = unicode(hlav, 'utf-8')
print type(hlavicka)
print hlavicka
==============================

Dá se to trochu vylepšit tím, že si nadefinuješ funkci u(), aby seto hezčeji zapislovalo:


==============================
# -*- coding: utf-8 -*-

def u(raw_string, encoding='utf-8'):
    return unicode(raw_string, encoding)


hlavicka = u(r"""
documentclass[a4,landscape]{article}
usepackage{graphicx}
usepackage[czech]{babel}
usepackage[utf8]{inputenc}
begin{document}
thispagestyle{empty}
""")

print type(hlavicka)
print hlavicka
==============================

V případě chroupání LaTeXových textů bych ale vážně uvažoval
o použití Python 3 -- i kdyby jen pro tento účel. Tam se z toho
stane jen raw string, který bude automaticky unicode (nový str
se rovná starému typu unicode):

==============================
# -*- coding: utf-8 -*-

hlavicka = r"""
documentclass[a4,landscape]{article}
usepackage{graphicx}
usepackage[czech]{babel}
usepackage[utf8]{inputenc}
begin{document}
thispagestyle{empty}
"""

print(type(hlavicka))
print(hlavicka)
==============================

Pokud toho chroustacího kódu máš víc, pomůže pythonovská
utilitka 2to3 -- viz 
http://diveintopython3.py.cz/case-study-porting-chardet-to-python-3.html#running2to3

P.

_______________________________________________
Python mailing list
Python@py.cz
http://www.py.cz/mailman/listinfo/python

Re: [python] Tisk unicode řetězce a tisk seznamu unicode řetězců

Odpovedet emailem