Re: How to print first(national) char from unicode string encoded in utf-8?
2008/9/1 [EMAIL PROTECTED]: Hi, I have a problem with unicode string in Pylons templates(Mako). I will print first char from my string encoded in UTF-8 and urllib.quote(), for example string 'Łukasz': ${urllib.unquote(c.user.firstName).encode('latin-1')[0:1]} and I received this information: type 'exceptions.UnicodeDecodeError': 'utf8' codec can't decode byte 0xc5 in position 0: unexpected end of data When I change from [0:1] to [0:2] everything is ok. I think it is because of unicode and encoding utf-8(2 bytes). How to resolve this problem? Best regards -- http://mail.python.org/mailman/listinfo/python-list First: you're talking about utf8 encoding, but you've written latin1 encoding. Even though I do not know Mako templates, there should be no problem in your snippet of code, if encoding is latin1, at least for what I can understand. Do not assume utf8 is a two byte encoding; utf8 is a variable length encoding. Indeed, 'a' encoded as utf8 is 'a' (one byte) 'à' encode as utf8 is '\xc3\xa0' (two bytes). Can you explain what you're trying to accomplish (rather than how you're tryin to accomplish it) ? Regards Marco -- Marco Bizzarri http://notenotturne.blogspot.com/ http://iliveinpisa.blogspot.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: How to print first(national) char from unicode string encoded in utf-8?
On 1 Wrz, 15:10, Marco Bizzarri [EMAIL PROTECTED] wrote: 2008/9/1 [EMAIL PROTECTED]: Hi, I have a problem with unicode string in Pylons templates(Mako). I will print first char from my string encoded in UTF-8 and urllib.quote(), for example string 'Łukasz': ${urllib.unquote(c.user.firstName).encode('latin-1')[0:1]} and I received this information: type 'exceptions.UnicodeDecodeError': 'utf8' codec can't decode byte 0xc5 in position 0: unexpected end of data When I change from [0:1] to [0:2] everything is ok. I think it is because of unicode and encoding utf-8(2 bytes). How to resolve this problem? Best regards -- http://mail.python.org/mailman/listinfo/python-list First: you're talking about utf8 encoding, but you've written latin1 encoding. Even though I do not know Mako templates, there should be no problem in your snippet of code, if encoding is latin1, at least for what I can understand. Do not assume utf8 is a two byte encoding; utf8 is a variable length encoding. Indeed, 'a' encoded as utf8 is 'a' (one byte) 'à' encode as utf8 is '\xc3\xa0' (two bytes). Can you explain what you're trying to accomplish (rather than how you're tryin to accomplish it) ? Regards Marco -- Marco Bizzarrihttp://notenotturne.blogspot.com/http://iliveinpisa.blogspot.com/ When I do ${urllib.unquote(c.user.firstName)} without encoding to latin-1 I got different chars than I will get: no Łukasz but Åukasz -- http://mail.python.org/mailman/listinfo/python-list
Re: How to print first(national) char from unicode string encoded in utf-8?
On Mon, Sep 1, 2008 at 3:25 PM, [EMAIL PROTECTED] wrote: When I do ${urllib.unquote(c.user.firstName)} without encoding to latin-1 I got different chars than I will get: no Łukasz but Å ukasz -- http://mail.python.org/mailman/listinfo/python-list That's crazy. string.encode('latin1') gives you a latin1 encoded string; latin1 is a single byte encoding, therefore taking the first byte should be no problem. Have you tried: urlib.unquote(c.user.firstName)[0].encode('latin1') or urlib.unquote(c.user.firstName)[0].encode('utf8') I'm assuming here that the urlib.unquote(c.user.firstName) returns an encodable string (which I'm absolutely not sure), but if it does, this should take the first 'character'. Regards Marco -- Marco Bizzarri http://notenotturne.blogspot.com/ http://iliveinpisa.blogspot.com/ -- http://mail.python.org/mailman/listinfo/python-list